INVESTIGATING STATISTICAL CONCEPTS, APPLICATIONS, AND METHODS,
Third Edition
NOTES FOR INSTRUCTORS
August, 2016
Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5
CHAPTER 5: COMPARING SEVERAL POPULATOINS, EXPLORING RELATIONSHIPS
This chapter again extends the lessons students have learned in earlier chapters to additional methods involving more than one variable. The material focuses on the new methods (e.g., chi-square tests, ANOVA, and regression) and follows the same progression as earlier topics: starting with exploring appropriate numerical and graphical summaries, proceeding to use simulations to construct empirical sampling/randomization (reference) distributions, and then considering probability models for drawing inferences. The material is fairly flexible if you want to pick and choose topics. Fuller treatments of these methods would require a second course.
Section 1: Two Categorical Variables
In this section with start with a k x 2 case and then expand to larger tables. We again start with a simulation and then apply the Chi-Square distribution and more traditional output.
Fall 2015: We have reworked this section to have the following progression
· Comparing several population proportions (independent random samples)
· Comparing several population distributions (random sampling, response not binary)
· Comparing several treatment probabilities (randomized experiment)
· Association between two variables (one random sample)
The first three are highlight as tests of homogeneity and the last as test of association, but you may choose to worry less about this distinction. So the entire chapter would not need to be renumbered, the newspaper credibility study was moved to Investigation 1.5A and expanded to three years. We have phased out the Minitab macros/R script but they can still be accessed on the Data Files page.
Investigation 5.1: Dr. Spock's Trial
Timing/materials: Technology is used to create empirical sampling distributions (based on the binomial sampling model) in Investigation 5.1. There is are Minitab and JMP macros (e.g., SpockSim.mac) and an R script (Spock.R) that can be used. This investigation will take approximately 50-60 minutes.
In Investigation 5.1 you may want to spend a bit of time on the background process of the jury selection (e.g., venires) before students analyze the data. We also encourage you again to stop and discuss the numerical and graphical summaries that apply to these sample data (question (a)) before proceeding to inferential statements, as well as reconsidering the constant theme – could differences this large have plausibly occurred by chance alone? Once you get to inference, the key change is needing a more complicated test statistic (part c) and you may want to ask students to spend more time on that, even developing their own statistics (e.g., sum of all pairwise differences). Students then explore one possible statistic called the mean absolute difference. This is followed by the more traditional chi-square statistic, with a focus on how this represents the null distribution and why we only consider larger values as “more extreme.” After thinking about the behavior based on the formula, we then use simulation as a way of judging what values of the statistic should be considered large enough to be unusual and also to see what probability distribution might approximate the sampling distribution of the test statistic. Because we are simulating the drawing of independent random samples from several populations, we use the binomial distribution, as opposed to treating both margins as fixed like we did in Chapter 3. This is done by the R, JMP, and Minitab macros. The simulations randomly generate male/female counts for each judge's panel by sampling from a binomial distribution with ni equal to judge i's sample size and assuming p = .261 for the probability of a juror being female for each judge. [Remember in R to save the file using quotations or use the .txt extension when you call the macro. In fact, it may add on .txt even if you use the quotation marks. Minitab stores the results totals for each gender are stored in C2-C8.] Then the expected counts are computed based on the simulated total number of males and females and the "observed" counts are compared to the excepted counts. The 14 terms of the Chi-Square sum and the resulting Chi-Square statistic for each sample are computed. [Make sure students using Minitab remember to save both the .mac file and an empty worksheet file in the same folder and students using R define the variables they want to store results in (e.g., mychisq=0).] Using R or Minitab or JMP will allow you to change the df to try different theoretical models.
The simulation results are also used to help students see that the normal distribution does not provide a reasonable model of the empirical sampling distribution of this test statistic. We do not derive the chi-square distribution but do use probability plots to show that the Gamma distribution, of which the chi-square distribution is a special case, is appropriate (questions (q) and (r)). Again, we want them to realize that the probability model applies no matter the true value of the common population proportion p. We also encourage them to follow-up a significant chi-square statistic by seeing which cells of the table contribute the most to the chi-square sum as a way of further defining the source(s) of discrepancy (questions (u) and (v)). A practice problem on Type I Errors is used to further motivate the use of this "multi-sided" procedure for checking the equality of all population proportions simultaneously.
Investigation 5.1A: Newspaper Credibility Decline
This
investigation starts with some study design and data wrangling issues and then
focuses on comparing three groups on a categorical but non-binary response
variable. The formal expression for expected counts is introduced. Students also compare the chi-square results
for a 2x2 table to a two-sample z-test.
Investigation 5.2: Teaching Morals
This
homework exercise has been made an investigation to focus on the randomized
experiment. Students use a different
simulation method but find that the chi-square distribution is still a
reasonable mathematical model for the chi-square statistic. The Analyzing Two Way Tables applet is used.
You can also use the applet to explore other statistics like Mean Abs Diff and
Max – Min.
[Note: Output windows can be enlarged to improve layout. When you check the
Show X2 output box, the theoretical curve also overlays on the empirical
randomization distribution. In the applet, you can also toggle which column is
the explanatory and which is the response, as well as which outcome is defined
as success. When you enter larger tables, the MAD statistic is only calculated
using one row as success, though you can select different rows.]
Investigation 5.3: Night Lights and Near-Sightedness (cont.)
Timing/materials: You may want to get students that realize that the randomization modeled here could be different from the earlier investigations, though we end up with the same theoretical model. You can also focus more on using Minitab and R or JMP or the applet to generate the chi-square analysis.
Investigation 5.3 provides an application of the chi-square procedure but in the case of a cross-classified study. You might want to start by asking them what the segmented bar graph would have looked like if there was no association between the two variables. The "no association model" can also be simulated through a randomization test building on earlier simulations with quantitative response data.
You may also wish to give students additional practice in applying the procedure, in addition to the practice problems, especially in distinguishing these different situations, (e.g., reminding them of the different data collection scenarios, the segmented bar graphs, the form of the hypotheses – comparing more than 2 population proportions, comparing population distributions on a categorical variable, association between categorical variables).
Section 2: Comparing Several Population Means
Timing/materials: This investigation now centers around the Comparing Groups (Quantitative) javascript applet for descriptive analysis and simulation (random assignment). This produces standard ANOVA output which can also be compared to a statistical package. This section should take about 65 minutes.
The focus on this section is on comparing two or more population means (or treatment means). You may want to cast this as the association between one categorical and one quantitative variable to parallel the previous section (though some suggest only applying this description with cross-classified studies). Again, we do not spend a large amount of time developing the details, seeing these analyses as straight forward implementations of previous tools with slight changes in the details of the calculation of a test statistic. We hope that students are well-prepared at this point to understand the reasoning behind the big idea of comparing within-group to between-group variation, but you might want to spend some extra time on this principle. You will also want to focus on emphasizing all the steps of a statistical analysis (examination of study design, numerical and graphical summaries, and statistical inference including defining the parameters of interest, stating the hypotheses, commenting on the technical conditions, calculation of test statistic and p-value, making a decision about the null hypothesis, and then finally stating an overall conclusion that touches on each of the issues).
Investigation 5.4 steps students through the calculations and comparison of within group and between group variability and uses a technology simulation to examine the empirical sampling distribution of the test statistic (mean abs difference and F statistic). Question (s) is a key one for assessing whether students understand the basic principle. More details are supplied in the terminology detour and general technology instructions for carrying out an ANOVA analysis.
In Investigation 5.5, students initially practice calculating the F-statistic by hand. Another applet (ANOVA Simulation) is used to explore the effects of sample size, size of the difference in population means, and the common population variance on the ANOVA table and p-value. We have tried to use values that allow sufficient sensitivity in the applet to see some useful relationships. It is interesting for students to see the variability in the F-statistic and p-value from sample to sample both when the null hypothesis is true and when it is false. An interesting extension would be to collect the p-values from different random samples and examine a graph of their distribution, having students conjecture on its shape first.
Practice Problem 5.5A is a particularly interesting follow-up question, re-analyzing the Spock trial data using ANOVA instead of Chi-square, and considering how the two analyses differ in the information provided. Practice Problem 5.5B demonstrates the correspondence of ANOVA to a two-sided two-sample t-test, when only two groups are being compared, and is worth highlighting. An interesting in-class experiment to consider in the section on ANOVA is the melting time of different types of chips (e.g., milk chocolate vs. peanut butter vs. semi-sweet), especially considering each person as a blocking factor (if you interested in briefly discussing "two-way" ANVOA). You might also consider at least demonstrating multiple comparison procedures to your students. (The confidence interval checkbox in the Comparing Groups applet apply 95% confidence intervals but using the pooled standard deviation.)
Section 3: Relationships Between Quantitative Variables
Timing/materials: Technology is used for basic univariate and bivariate graphs and numerical summaries in Investigation 5.6 (CatJumping.txt). Technology is used to calculate correlation coefficients in Investigation 5.7 (golfers.txt). These two investigations may take about 45 minutes. The applet exploration revolves around the Guess the Correlation applet and will take 10-15 minutes. Investigation 5.8 uses a new version of the Analyzing Two Quantitative Variables (javascript) applet and at the end shows them how to determine a regression equation using technology (HeightFoot.txt) and can take upwards of 60 minutes. An applet exploration also uses this applet to explore the resistance of least squares regression lines and influential observations. Investigation 5.9 also involves technology (movies03.txt) and may take 30 minutes.
This section presents tools for numerical and graphical summaries in the setting of two quantitative variables. Here we are generally less concerned about the type of study used. The next section will focus on inference for regression.
Investigation 5.6 focuses on using technology to create scatterplots and then introducing appropriate terminology for describing them.
Investigation 5.7 uses data from the same source (PGA golfers) to explore varying strengths of linear relationships and then introduces the correlation coefficient as a measure of that strength. One thing to be sure that students understand is that low scores are better than high scores in golf; similarly a smaller value for average number of putts per hole is better than a larger value, but some other variables (like driving distance) have the property that higher numbers are generally considered better. Discussion in this investigation includes how the points line up in different quadrants as a way of visualizing the strength of the linear relationship. Question (i) is a particularly good one to give students a few minutes to work through on their own in collaborative groups. Students should also be able to describe properties of the formula for r (when positive, negative, maximum and minimum values, etc.); in fact, our hope in (k)-(n) is that students can quickly tell you these properties rather than you telling them. Students apply this reasoning to order several scatterplots in terms of strength and then use technology to verify their ordering.
If you want students to have more practice in estimating the size of the correlation coefficient from a scatterplot, the Guess the Correlation Applet Exploration generates random scatterplots, allows students to specify a guess for r and then shows them the actual value. The applet keeps track of their guesses over time (to see if they improve) as well as the guesses vs. actual and errors vs. actual to see which values of r were easier to identify (e.g., closer to -1 and 1). Questions (g)-(i) also get students to think a bit about the meaning of r. Students often believe they are poor guessers and that the correlation between their guesses and the actual values of r will be small. They are often surprised at how large this correlation is, but should realize that this will happen as long as they can distinguish positive and negative correlations and that they may find a high correlation if they guess wrongly in a consistent manner.
Practice Problem 5.7A is a very quick test of students' understanding; question (b) in particular confuses many students. You will also want to continually remind students that r measures the amount of the linear association (e.g., you could jump ahead to the Walmart data and explore the correlation of the number of SuperCenters vs. time).
Investigation 5.8 steps students through a development of least squares regression. Starting after (g), they use a javascript applet with a moveable line feature to explore "fitting the best line" and realize that finding THE best line is nontrivial and even ambiguous, as there are many reasonable ways to measure "fit." We emphasize the idea of a residual, the vertical distance between a point and the line, as the foundation for measuring fit, as prediction is a chief use of regression. In question (o) we briefly ask students to consider the sum of absolute residuals as a criterion, and then we justify using SSE as a measure of the prediction errors. In questions (k)-(m) many students enjoy the competitive aspect of trying to come up with better and better lines according to the two criteria. Students can then use calculus to derive the least squares estimators directly in (t) and (u). Questions (u) and (w) develop the interpretation of the slope coefficient and question (y) focuses on the intercept. Question (z) warns them about making extrapolations from the data. The applet is then used in questions (aa) and (bb) to motive the interpretation of R2. Once the by-hand derivation of the least squares estimates are discussed, instructions are given for obtaining them in Minitab/R. The applet exploration allows students to investigate resistance properties of the least squares lines and the idea of influential observations and how to identify potentially influential observations. (This applet can also be used to obtain basic regression output.) The Excel Exploration also allows them to explore properties of the sum of absolute errors and the corresponding "best file" line.
New: We have added some discussion and interpretation of s in the regression model. The applet now also allows users to display s.
Investigation 5.9 provides practice in determining and interpreting regression coefficients with the additional aspect, which students often find interesting, of comparing the relationship across different types of movies, although the data are getting a bit dated. Question (i) asks students to remember how to subset the data as in Investigation 2.1.
Section 4: Inference for Regression
Timing/materials: Investigations 5.10 and 5.11 revolves around the Analyzing Two Quantitative Variables applet. In Investigation 5.10 the focus is on random sampling from a finite population (using new applet functionality); in Investigation 5.11 the focus is on random shuffling of the response variable. Timing will depend on whether you are primarily demonstrating the results or letting students explore. The Talley5K.txt data set will need to be parsed by whatever software you are using to fit into the 9 columns (or convert first in Excel using Data > Text to Columns, Deliminated by spaces). Investigation 5.12 introduces the basic regression model assumptions (you can see these visually with the Create Population option, selecting Observed x rather than Bivariate) which are then applied in Investigation 5.13 returning to the CatJumping data. This investigation can also be used to explore confidence vs. prediction intervals. The need for and use of transformations are now explored in Investigation 5.14 (housing.txt).
Investigation 5.10 follows the strategy that we have used earlier in the course: taking repeated random samples from a finite population in order to examine the sampling distribution of the relevant sample statistic. We ask students to use an applet to select random samples from a hypothetical population matching the characteristics of the 5K run setting that follows the basic regression model, but where the population has been chosen so that the correlation (and therefore the slope) between time and age is zero. The goal of the applet is for students to visualize sampling variability with regression slopes (and lines) as well as the empirical sampling distribution of the sample slopes. This process should feel very familiar to students at this point, although you should be aware that it feels different to some students because they are watching sample regression lines change rather than seeing simpler statistics such as sample proportions or sample means change. Students also explore the effects of sample size, variability in the explanatory variable, and variability about the regression line on this sampling distribution. This motivates the formula for the standard error of the sample slope. It is interesting to help students realize that when choosing the x values, as in an experiment, more variability in the explanatory variable is preferred, a sometimes counter-intuitive result for them. Students should also note the symmetry of the sampling distribution of sample slope coefficients and believe that a t-distribution will provide a reasonable model for the standardized slopes using an estimate for the standard deviation about the regression line. Students calculate the corresponding t-statistic for the 5K data by hand which can be confirmed with technology in (z). Investigation 5.11 uses a different approach for the simulation, a randomization test approach which scrambles the response variable values. Students may find it interesting to compare these approaches. The implications are not substantial but they may also be able to talk about how the standard errors measure slightly different types of randomness. This investigation “forward references” the basic model assumptions coming in Investigation 5.12.
The second practice problem for Investigation 5.10 offers students the chance to look at the official results online. They will need to scrape these data into R or Minitab or Excel (it actually copies and pastes quite easily into Minitab) and may want to convert the times into minutes. If you want to do more non-traditional analyses with these data, you can find all three runners not listed in the first data set, which other runner still seems to be in the dataset twice, which family name had the largest group of runners, etc.
Applet notes: In Investigation 5.10 it is important that the data is pasted in as (explanatory, response) in the first place. There is an outlier that needs to be removed, either before they paste in or at the bottom of the data window or by clicking on the observation and pressing the Delete button and then pressing Use Data. (Once you press Use Data then the outlier is lost to the applet, otherwise you can press Revert.) My suspicion is that this runner has two times listed because he finished and then doubled back to join his mom or sister who also seem to be the dataset. If you are trying to click on an individual observation, again make sure your window is scrolled to the top of the applet will be confused by your mouse coordinates. Then students check the Create Population box and they should see the corresponding summary statistics that are being matched to the 5K data. They still need to press the Create Population button and then should see the population of 20,000 observations (try to make sure they don’t press the Use Data button or the sample data will be lost to the applet but can be repasted back in later). This population is randomly generated so results may differ slightly from student to student. The null distribution SD created this way should be a pretty good match to the SE reported in the standard regression output. In part (aa), they should be able to uncheck the Create Population box and then press Revert to return to the original data (after a slight pause), or else they can paste the original back in.
Investigation 5.12 begins by having students consider the "ideal" setting for such inferences – normal populations with equal variance that differ only in their means that follow a linear pattern with the explanatory variable. We especially advocate the LINE mnemonic. Residual plots are introduced as a method for checking the appropriateness of this basic regression model. Investigation 5.13 then applies this model to the cat jumping data, including confidence intervals for the population slope and prediction vs. confidence intervals (and the distinction between them, for which you can draw the connection to univariate prediction intervals from Chapter 3) for individual values. Minitab provides for nice visuals for these latter intervals. The bow-tie shape they saw in the applet is also a nice visual here for justifying the "curvature" seen especially in prediction intervals.
Investigation 5.14 finds problems with the residual analysis and explores a transformation (recalling) for addressing the conditions. Students should realize that additional steps can be taken when the conditions are not met and we try not to get too bogged down at this time in interpreting the transformation. The Technology Exploration introduces students to the "regression effect." There is a nice history to this feature of regression and it also provides additional cautions to students about drawing too strong of conclusions from their observations (e.g., "regression to the mean"). We often supplement this discussion with excerpts from the January 21, 2001 Sports Illustrated article on the cover jinx. "It was a hoot to work on the piece. On the one hand, we listened as sober statisticians went over the basics of 'regression to the mean,' which would explain why a hitter who gets hot enough to make the cover goes into a slump shortly thereafter."
Examples
This chapter includes four worked-out examples. Each of the first three deals with one of the three main methods covered in this chapter: chi-square tests, ANOVA, and regression. The fourth example analyzes data from a diet comparison study, where we ask several questions and expect students to first identify which method applies to a given question. Again we encourage students to answer the questions and analyze the data themselves before reading the model solutions.
Summary
At the end of this chapter, students will most need guidance on when to use each of the different methods. The table may be useful but students will also need practice identifying the proper procedure merely from a description of the study design and variables. We also like to remind students to be very conscious of the technical conditions underlying each procedure and that they must be checked and commented on in any analysis.