INVESTIGATING STATISTICAL CONCEPTS, APPLICATIONS, AND METHODS, Second Edition
NOTES FOR INSTRUCTORS
January, 2015
Chapter 1 Chapter 2 Chapter 3 Chapter 4
CHAPTER 2
This chapter focuses on comparing two proportions but in a manner that mirrors the analysis process from Chapter 1. You can emphasize to students that the main structure is the same: determining good graphs and numbers to look at when analyzing the sample data, followed by inference procedures that build on an appropriate simulation model (but can lead to “exact” calculations and then to large sample normal approximations) which allow us to make conclusions beyond the sample data. In fact, those conclusions will get more interesting as we get into issues of causation in additional to generalizability. The last section (Section 4) focuses on relative risk and odds ratios. These are skippable topics (or you can focus on descriptive but not inferential methods) though they also allow students the opportunity to apply the analysis process and thinking habits in new ways which can also be very empowering for students.
Section 1
This first section extends what they just finished working on in Chapter 1, comparing two population proportions arising from independent random samples, while raising new issues to consider with group comparisons (confounding). An earlier investigation on Gender and Blood Donations is not here now but could serve as a good follow-up homework problem (see Exercise #1) where students are asked to consider two variables collected on one random sample as a problem involving two independent samples.
Investigation 2.1: Teen Hearing Loss (cont.)
Timing: If you have students use all of the new technology tools in class, including running their own simulations, this will take probably the first 50-min class period. Then the second class period can focus on applying the two-sample z-procedures, perhaps with an additional example. (There are two practice problems that can correspond to this split in days as well.)
Materials needed: If split over two days, you may want to have simulation results ready to discuss at the start of day 2.
You will want to enforce some caution in interpreting the difference in conditional proportions. Encourage students to always focus on the difference in proportions instead of percentages. As they will see if you discuss the material on relative risk, a “change of 10%” usually implies multiplication rather than addition/subtraction. You will also want to caution students to be careful in describing how they conditioned their calculations (e.g., the proportion of senators that are male is very different from the population of males that are senators). You may also want to show Excel as an option for the segmented bar graph.
For the simulation, we assumed the same value of . You can let students know that the value for is not that critical, as long as it is the same for both populations. (A good follow-up question is why we didn’t use .15 and .19 as 1 and 2.) You will also want to make sure students are understanding the steps being performed by the simulation at each stage. Try to emphasize the common structure and the “why” behind the simulation steps. You can also get them used to worrying about rounding errors in finding the p-value. Also watch use of spacing with R commands so that you are not accidently replacing values. In (p), you may want to have them save the counts first, and then divide by the sample size to get the sample proportions. The Boolean expression being used in the simulation (part q) is an idea that they will return to repeatedly. Notice the simplification of the technical conditions made in the Summary box at the end of the investigation. Students are often bothered about the direction of subtraction and how success and failure are defined and you can have them work through some calculations to see how to make these adjustments from the first calculation.
Technology reminders: the MTB prompt in Minitab and using “+” when an R command continues onto the next line. They also have freedom to name vectors whatever they want. If you want to display multiple graphs, can use par(mfrow), the first number is the number of rows and the second number is the number of columns, and then zoom graph size as well.
Investigation 2.2: Nightlights and Near-sightedness
Timing: A previous iteration of this activity as a class discussion, along with discussion of the practice problems, took approximately 30 minutes. (Students can be asked to work through 2.2-2.4 on their own in a 50-minute period with discussion the following class period.)
Students can first use these data to practice a two-sample z-test and confidence interval, but then this is a natural time to talk about drawing cause-and-effect conclusions. (This activity does discuss how you might conduct the simulation differently by modeling one random sample but we wouldn’t stress that at this point.) You can build on the teen hearing loss study, that even with a statistically significant difference between the two years, we aren’t able to isolate what might have caused the increase in hearing loss. Students are also usually pretty quick to develop (collaboratively) an alternative explanation in the night-light study. (By the way, it should be easy to find fairly recent news articles related to this and/or similar studies.) But you will want to make sure they are very clearly tying the alternative explanation (e.g., genetics) to both the explanatory variable and the response variable. The questions at the end of the investigation aim to help make the distinction between “other variables” and ones that are truly confounding variables with the explanatory and response in the study. The practice problems give them practice identifying and explanatory confounding variables and these are good ones to discuss in class as well.
Another very good nice context here is the OK City Thunder home game record their second year in existence (see Exercise #4).
Section 2
So then we will move into considering different study designs and the scope of conclusions we can potentially draw based on the study design. (In a previous iteration, we had proceeded to randomization tests firsts, but this flow may be more natural and students should stay engaged if the study contexts are sufficiently interesting.)
Investigation 2.3: Handwriting and SAT Scores
Timing: The ideas in this investigation can be presented rather quickly through a class discussion, approximately 30 minutes.
This investigation begins with practice identifying explanatory and response variables as introduced in the previous investigation. Then two different study designs are compared. With experiments, we prefer to cite the imposition of the explanatory variable as the critical feature, with random assignment as the way to properly carry out such an experiment and only if you have both will you be able to draw cause-and-effect conclusions. (The practice problem emphasizes this: even though they “did something” to the subjects, they did not impose the explanatory variable and therefore no causation can be drawn from the subjects’ emotions.) Students should also realize that experiments are not always feasible and/or can create an environment that is too artificial to be generalizable to the real world.
Investigation 2.4: Have a Nice Trip
Timing: Timing will depend on how much you have students use the applet themselves. We often do it more as a class demo in which case it may only take 15 minutes.
Technology: Use of Randomizing Subjects applet (javascript). You should be able to find some videos demonstrating the experiment (e.g., lowering elevating).
This investigation aims to explore properties of random assignment. You will want to continue to emphasize to students that now you are focusing on a different source of randomness, that is implemented differently than random sampling (what you do with the subjects once you have them rather than how you get the subjects in the first place) with different goals and consequences. We have changed the applet to focus on 24 subjects rather than 12. We are hoping this will reduce the chance variability enough to be more convincing to students. Still, students will probably need a fair bit of help seeing the “big picture” this applet is trying to show them. Students are often skeptical of the power of random assignment, so you will really want to drive home the point that you will really believe there are no other substantial differences between the two groups. Also remind students that you are talking about equality of the groups, not about individuals. We also strongly recommend highlighting the table on at the end of the investigation as one of the most important in the entire text! We like to emphasize that although we would like our study to be in the top left corner of this table, that is very rare (practice problem 2.4 is the only one in the text!), though many studies are in the bottom left. (You can discuss recent clinical trials that have found difficulties when the drug went to market for a more general population than the sample used to evaluate the side effects of the drug.)
Investigation 2.5: Botox for Back Pain
Timing: Again can be used as class discussion (~30 min) or student exploration/practice. Some of the ideas will be new to students but you will want to emphasize that their common logic can be pretty useful here.
The main goals of this investigation are to expose students to a genuine excerpt from a journal article (show them how far they have come!) and to provide an opportunity to discuss more subtle issues with experimental design (e.g., standardized measurements, placebo treatments, realism, and feasibility). Note “randomized” is mentioned in the title but not in the abstract. You may want to ask whey the researchers didn't only examine 11/15.The first practice problem continues the theme of feasibility, the second provides additional practice with defining the key terms and their “statistical” meanings over the everyday meanings, and with the two components of scope of conclusions.
Section 3
Now we return to investigating statistical significance, again starting with simulation, but initially replicating the random assignment process rather than random sampling.
Investigation 2.6: Dolphin Therapy
Timing: The background material can take about 15 minutes and the card shuffling 10-15 minutes. Then using the applet and drawing conclusions can be another 15 minutes. If split over two days, getting through the tactile simulation on day one can work well.
Materials needed: For tactile simulation can use 30 playing or index cards, each pack with 13 of one type and 17 of another type (e.g., red vs. black, suits, face cards vs. non-face cards). More recently we have used color index cards so students are less distracted by the type of card. Then the investigation makes use of a javascript applet which will load with data for the dolphin study.
After setting the stage, you will want to have some caution in defining the parameters as there are multiple approaches. One is in terms of treatment probabilities but this may not feel very concrete (and different from the sample proportions) to students. Another is to define them in terms of population proportions where you consider the populations to be all individuals who could potentially be on these treatments. Try to get them to think hard about question (f), even taking a few minutes to brainstorm in pairs.
Depending on your class size, you may have enough observations if they only repeat the process once or twice. In collecting the simulated data from the class, it is probably easiest to focus on the number of successes in group A (rather than the equivalent difference in sample proportions). The latter calculation takes time (and more prone to error, but do make sure they all subtract in the same direction) and defining the random variable this way will be consistent with Fisher’s Exact Test when they turn to that. Still you will want to emphasize the equivalence to students and it is probably easier for them to think about why the difference in proportions should be near zero rather than thinking in terms of E(X). You can emphasize that the simulation is helping you count how many “tables” are at least as extreme as the one observed. We feel the tactile simulation is still useful at this point to help students see the distinction.
Some students may struggle with how this process fits into the earlier analyses (e.g., one sample or two random samples). You may want to use diagrams and/or concepts maps to help them see how this fits in to the larger picture. Also, in interpreting their p-value at this point, we try to emphasize whether they are talking about the percentage of random samples vs. the percentage of random assignments, and no longer letting them simply say “by chance.”
Investigation 2.7: Is Yawning Contagious?
Timing: 30-50 minutes (parts of this investigation can be assigned in advance)
Materials needed: Uses a generic two-way table inference applet (java-script) that allows user to enter own table (you can enter the labels and counts, with spaces and line breaks and then press Use Table). You can find a link to a video about this study at the Discovery website. Here is another video that’s a bit more focused on the study they will analyze. It’s also useful to emphasize how the research process is very iterative. Students can also be asked to view one of the videos between classes.
Students are given more practice and freedom in organizing the study results. Depending on the level of your students, you may want to go through several versions of the hypergeometric probability calculation by hand and even with the technology, practicing the appropriate input values depending on how the table is set up. Other students may ask about the conditioning on the observed data and there is actually some debate in the statistics community, but this is the standard approach (and hotly advocated by Fisher) especially for tables with smaller observed counts. Students will struggle with the distinction between binomial sampling (sampling with replacement) and hypergeometric sampling (sampling without replacement) and you will want think about how much to emphasize this.
Investigation 2.8: CPR vs. Chest Compressions
Timing: Investigation 2.8 focuses on descriptive statistics and the difference in proportions and the normal approximation. Investigation 2.9 focuses on relative risk and related inferential methods. The combination may take around 75 minutes.
This is another study that has been in the news recently and you may want to find recent articles and/or the current AHA recommendations. Students are given flexibility in setting up their table but this does impact the later components of the investigation so you may want to enforce some consistency (e.g., CC as the first column? Survival as the top row?) as you will return to this table frequently. This first investigation serves to apply the methods they have learned so far for comparing two proportions. Make sure students notice the direction implied in (d). Remind them in (h) that the 10% level really would have needed to have been set in advance.
Investigation 2.9: Flu Vaccine NEW
This is a new context (Fall 2014) that compares two different vaccines. In this study, students begin to consider limitations to these calculations. In particular, when the probability of success is small to begin with it can be difficult to interpret a small difference in conditional proportions. This motivates discussion of relative risk (and an interpretation in terms of percentage change). After establishing some benefits to examining relative risk, we next need to consider corresponding inferential methods for obtaining p-values and confidence intervals. So we go back to simulation. Here again we model the random assignment process rather than random sampling. Even though you haven’t discussed quantitative data in detail much, most students will agree this is not the most normal/symmetric distribution. But we can still find an empirical p-value. Many students will find the technology (even generating the formulas themselves) a bit overwhelming so you want to really emphasize the big picture and the change in parameter. Students don’t seem to mind the standard error formula simply appearing. Here, you will want to be especially careful with your rounding so that you don’t exclude the actual observed result from your tally. In fact, because of the one-to-one corresponding between the difference in conditional proportions and the relative risk, the “as extreme” outcomes are exactly the same ones as before and you find the same empirical p-value. But this doesn’t help us find a confidence interval, which we only know how to do with symmetric distributions. This motivates a transformation which can bother many students (“aren’t you changing the data?”) so try to emphasize this as simply a rescaling of the values (the ordering of the values does not change). You apply the back transformation at the end and the statistic will still be guaranteed to be in the interval but it is no longer necessarily the midpoint. Also make sure students realize why you want to consider whether the value one is in this interval rather than zero.
The idea of “efficacy” pops up here but not anywhere else.
Technology note: Keep in mind that R and Minitab assume natural log when you use “log” and that the standard deviation formula assumes natural log. When you look at the Minitab histograms, you may want to reduce the number of intervals to prevent strange binning and show a more “filled in” distribution.
Investigation 2.10: Smoking and Lung Cancer
Timing: 50 minutes
This is a good historical example, building on the three of the first major studies identifying a link between smoking and lung cancer. You can even find pictures of the original journal article title pages. Try to convince students that the explanatory variable in these two studies is essentially the same.
The investigation first distinguishes different types of observational studies. The key idea is question (h), not being able to estimate the probability of lung cancer with a case-control study. This will be a difficult idea for students to get a handle on. The big consequence is that relative risk is not an appropriate statistic to examine, motivating discussion of odds ratio. You will want to go through these calculations very slowly and carefully with your students and continually emphasize that the interpretation now needs to be in terms of odds and not “chances” or “likelihood.” It is also fun to show students how much the relative risk can depend on how the two-way table is set up, but odds ratio is invariant. Some students will struggle with the relative risk and odds ratios calculations and interpretations, so try to give them plenty of practice. The idea of “dose response” pops up here but not anywhere else.
The second practice problem gives them more practice with odds ratio and seeing the distinction between relative risk and odds ratio.
Investigation 2.11: Sleepy Drivers
Timing: 50 minutes
As with relative risk, there is a second step to the analysis after the statistic has been defined, focusing on how to estimate p-values and confidence intervals for the new parameter. Again, the p-value will be the same, but exactly as before, we will have a skewed sampling distribution that is “fixed” by a log transformation and a standard error formula. This time the randomness in the simulation mimics the binomial sampling of the response variable categories based on the case-control design. Again, you will want to consider you much you want to emphasize this with students. Hopefully students are able to focus on the overall process by this point. These techniques will be very new to them but are much more representative of what they will see in the research literature. In fact, some statisticians argue the difference in conditional proportions is not a very useful statistic and should not be used.
The optional question asks students to generate 1000 confidence intervals. Keep in mind that for these simulation results the value of the population proportion equals 1.
Again make sure students notice the examples and the end of chapter reference material.