prepared by Allan J. Rossman, Dickinson College
Let me begin by emphasizing that there is no "right" way to teach with this book. I hope that Workshop Statistics will prove useful to students and instructors in a wide variety of settings. It can be used as a stand-alone text or as a supplement, with computers or graphing calculators, as in-class work or take-home assignments. Naturally, I think that the book will work best in a classroom environment that promotes the features I extol in the preface: active learning, conceptual understanding, genuine data, and use of technology.
The purpose of these notes is to supply information from my experiences teaching Workshop Statistics that might prove useful to other instructors. You can read through this document sequentially or skip directly to one of the sections listed below:
You could also access some of the other resources available:With 75-minute class meetings on a Tuesday/Thursday schedule, I use one topic for each class period. In a typical class period, I spend five minutes or so introducing the day's topic, collect data from the students (if the day's topic calls for it), and then let them work through the activities. I and a student assistant then spend most of the class time walking about the room, peering over shoulders, checking students' progress, and asking them questions about what they are doing. I try to visit each student several times during the period. Whether students have questions or not, I often ask personalized questions of them as a means of checking their understanding. If I notice a common problem, I might go to the front, ask everyone to stop working momentarily, and discuss the problem. I also put some answers on the board occasionally, allowing students to check their work. I might also put some computer instructions on the board at times. Finally, I try to spend the last five minutes discussing with the whole class what they were supposed to have learned that day.
I give three exams during the semester. These are not cumulative except that intelligent application of the material in later parts of the course depends largely on understanding earlier material. The exams try to stress understanding and interpretation as well as calculation. I allow students to use their books (and, of course, their responses to its questions) on the exams. Since the classroom has only half as many computers as students, I do not ask students to use computers on the exams. Instead I present them with computer output and ask questions about the interpretation of results. I sometimes devote the class period before an exam to a review session, a rare opportunity for me to lecture about the main ideas and to present analyses of sample activities.
I assign two or three homework activities per topic, collecting and grading them regularly. I try to select these so that the major ideas of the topic are covered; I also try to include a mix of problems that can be done by hand vs. with technology. I sometimes ask students to hand in their in-class activities as well, although this practice quickly creates a difficult grading burden. I encourage students to work together on the homework activities, but I require that they write up their comments individually. To facilitate students' recording information in their books, I recommend that students tear the perforated pages out of their books and place them in a three-ring binder (which I also do myself) from the start of the course. I do insist that students write directly in the spaces left in their books and not on scratch paper, for I think it helps students to have the questions and their responses together in one place.
I spend as little time as possible showing students how to use the technology, allowing students to concentrate on more substantive concerns. I spend more time on this earlier in the course as they are getting comfortable with the software and much less time as the course progresses. I sometimes use overhead projection equipment to demonstrate something, but more often I just write instructions on the board. In general I prefer students to explore the technology for themselves rather than to watch and mimic what I do.
Every topic begins with a "Preliminaries" section that asks students questions to get them thinking about the issues of the day. These questions often ask students to make guesses about the values of variables. While such questions are not central to learning statistics, I emphasize them for several reasons. They can provide the class with a sense of fun (you might even give a prize to the closest guesser of the day), and they hopefully motivate students' interest in the day's material. More importantly, they can show students that statistics is relevant to everyday life and get them used to thinking of data as more than just detached numbers.
Many topics call for data to be collected at the beginning of class. To protect students' privacy (at least somewhat), I pass around scratch paper, ask them to record the information on them, and have the student assistant collect the strips and write the data on the board for students to copy into their books.
The primary goals of this topic are to get students thinking about data, help them to appreciate different types of variables, and expose them to simple visual displays (bar graphs and dotplots) of a distribution.
The Preliminaries aim to get students thinking about data not as naked numbers but as numerical information with a context. Except in rare cases, I always collect data anonymously in an effort to avoid potential embarrassments. You might use the very first preliminary question as a model for many questions in the book that ask for students' guesses or intuitions by mentioning that students' actual responses are less important than their taking the question seriously and putting forth sincere efforts. Questions will likely arise (for example, does Puerto Rico count as a nation visited?) that could lead you to discuss some of the thorny issues involved with collecting good data.
Activity 1-1 tries to get students comfortable with the distinctions between types of variables. While this is a fairly simple task, it sometimes becomes problematic later when students need to decide whether an inference situation concerns means or proportions. Questions (b) and (c) try to indicate that how one measures the variable determines its type. Emphasizing precisely what variables and cases are is important here.
Activity 1-2 introduces the bar graph as a simple visual display of the distribution of a categorical variable. It also begins to address the necessity of writing one's conclusions when analyzing data.
Activity 1-3 asks students to tally the results of a data collection. While this is very straightforward, reading tables of tallies (or frequencies) is a crucial skill in later topics. Many students have considerable difficulty with reading tables of tallies correctly.
Activity 1-4 introduces the dotplot as a simple visual display of the distribution of a measurement variable. Question (b) aims to get students to identify personally with their data analysis, and questions (d) reinforces the importance of writing about one's data analysis. I suggest not giving many hints about what kinds of features to look for in a distribution; let them struggle to think of things on their own. I emphasize to students that some responses might be more insightful than others but that there are no clear-cut right/wrong answers here.
Activity 1-5 is the first to require technology for an efficient analysis. Students need to use technology to create a new variable (% of women) from existing ones and to produce a dotplot. Using technology to sort would also be helpful. You should decide whether you want to emphasize the distinction between "proportion" and "percentage". It's very important throughout the book that "proportion" refer to a number between 0 and 1 (inclusive). I expect "percentage" to be between 0% and 100%, but I make less of a fuss about this.
This topic leads students to develop a checklist of features to look for in a distribution and also introduces two new visual displays, stemplots and histograms.
Again there are many questions in the Preliminaries, aiming to get students thinking about some of the issues and data covered in the topic. I reiterate to students that I don't care how well they guess things like Jurassic Park's box office receipts, but I do care that they engage themselves with such questions and respond to them conscientiously. For the data collection in question 7, one method that works well is to use a tape measure to mark off measurements both vertically and horizontally on a chalk/whiteboard and have students work in pairs to measure their heights and armspans against those board markings.
Activity 2-1 leads students to develop a checklist of six features to consider when describing a distribution of data. You might want to lead students through this activity with a class discussion to make sure that nobody misses the point. Stressing the terminology of right- vs. left-skewness is probably worthwhile.
Activity 2-2 introduces the stemplot. Some students may not recognize that the easiest way to construct it is to go through the data in the order presented rather than looking for all of the single digits and then all of the teens and then all of the twenties and so on. Questions (f)-(h) lead students to anticipate the five-number summary; at this point I just expect them to come up with reasonable answers without using rules that will be developed in Topics 3 and 4.
Activity 2-3 asks students to interpret a histogram. Some will struggle to understand that the region with the midpoint of 2000 extends from 1000-3000 and so on. In question (d) I have in mind that three clusters emerge: public institutions and two groups of private schools.
Activity 2-4 pushes students a small step toward autonomy by asking less specific questions. Rather, it asks students to enter their own data into the technology to look at a couple of visual displays and to comment on key features that the displays reveal about the distribution.
This topic helps students to understand the mean and median as measures of the center of a distribution and for them to explore properties (such as resistance) of the mean and median. I try to stress that while measures of center are very important and useful, in most cases one still needs to look at a picture of the entire distribution of data.
The Preliminaries are briefer than in previous topics, but they again try to generate student interest and thinking about the issues and data for the topic. When collecting data on distances from home, you might want to discuss measurement errors and how different the actual mileages (as opposed to the perceived mileages) might be.
Activity 3-1 covers the basic calculations of the mean and median. I don't expect students to find the mean or median in (b); in fact, I'd prefer a more creative response. You will probably have to help many students in question (i), where they are to make the jump to the general case for identifying the location of the median.
Activity 3-2 simply introduces finding the median of an even number of observations.
Activity 3-3 leads students to investigate properties of the mean and median. You will need to show students how to use technology to perform these calculations. Questions (a) and (c)-(e) are good examples of questions where I want students to make thoughtful predictions, but I don't care much about how accurate their predictions are. I do care, of course, that they rethink their predictions in (b) and (g) if they turn out to be inaccurate. Question (b) should show that the mean and median do in fact measure the center of the distribution. Question (g) tries to establish that the mean generally exceeds the median for right-skewed distributions, that the reverse is true for left-skewed distributions, and that the mean and median are similar with symmetric distributions. You might want to interrupt the class and make sure that this point hits home. Questions (h)-(l) demonstrate that the mean is not resistant to outliers but the median is. Many students struggle with question (m), which tries to make the point that the mean is not a sensible measure with categorical variables.
Activity 3-4 aims to help students see an important limitation of measures of center. The moral here is that one often wants to consider the entire distribution and not just a measure of center. While the median readability level of pamphlets equals the median reading level of patients, many patients are left without a single pamphlet at their reading level. Some students, not realizing the importance of the "under 3" and "above 12" designations, will need guidance with (a). You might also be prepared for many students ignoring the tallies in (b) and just treating the distributions as if they were uniform with one observation at each level..
Activity 3-5 asks students to analyze some data collected on themselves and to consider the usefulness of the mean and median in this context.
Activity 4-1 introduces the basic calculations of inter-quartile range and standard deviation; it also introduces the boxplot as another visual display of a distribution. You may want to stress that the IQR is the difference between the quartiles; some students just stop with the quartiles themselves. I de-emphasize the calculation of the standard deviation in question (g) by supplying most of the steps for students. The particularly observant student might be confused in (g) because round-off errors prevent the "squared deviation" column from equaling the exact squares of the values appearing in the "deviation from mean" column. You might want to seize this opportunity to discuss round-off errors. The notation following question (h) intimidates some students, so I try to show them how the notation corresponds with the process they just completed. You may want to give students a model of a boxplot before they do question (i); the dotplot is there for comparison's sake and is not part of the boxplot as some students believe.
Activity 4-2 parallels Activity 3-3 in that it asks students to explore properties of the measures they've just learned about. Questions (a) and (b) should merely convince students that IQR and standard deviation do in fact measure the variability of a distribution; more variable distributions produce larger values of these measures. Questions (c)-(f) should lead students to conclude that the IQR is resistant to outliers while the standard deviation is not. Students should also realize in (f) that since the range looks only at the extremes, it is certainly not resistant.
Activity 4-3 leads students to the empirical rule. You may need to remind many students about how to read the table of counts: that one student scored 1, one scored 2, five scored 3, and so on. Some students will also need a more thorough explanation of the "plus/minus" notation. I try to emphasize that this empirical rule holds only roughly and even at that only for mound-shaped distributions.
Activity 4-4 introduces z-scores, an idea to which students return later (in Topic 14) when studying normal distributions.
For the Preliminaries data collection, you might want to bring along some coins of your own; students often carry no money with them.
Activity 5-1 tries to accomplish many goals, introducing the fundamental idea of a statistical tendency as well as the graphical technique of the side-by-side stemplot. Students enjoy this activity because they aren't afraid to expose their geographical ignorance. I encourage them to shout out a state about which they are unsure so that we can reach a class consensus. I'm careful not to tell them (until after they've completed the activity) that the "answers" concerning states' east/west status appear in Activity 5-7. I do tell students that they should end up with 26 eastern and 24 western states. Questions (e)-(g) guide students to the important concept of a statistical tendency; I try to make sure that every student understands this point.
Activity 5-2 introduces both comparative boxplots and modified boxplots. You might want to point out that the data here are already sorted, so finding quartiles by hand is not as difficult as it would otherwise be. Question (a) provides a good example of an activity for which I write the numerical answers on the board and insist that students check their calculations before proceeding. Some students need help to understand the description of the outlier test preceding question (d). Question (g) might provide a convenient time to remind students always to write in complete sentences and relate their comments to the context at hand.
Activity 5-3 follows in the vein of leading students to more open-ended problems toward the end of topics. You might want to advise students to enter the data as the coins' years and then use technology to create the "age" variable by subtracting the coins' year from the current year. You probably want to explain how to use technology to produce comparative boxplots on the same scale at this point.
Question 6 of the Preliminaries provides an excellent reminder of the importance of collecting data anonymously, for some students do not know who their fathers are and should not be forced to admit this publicly.
Activity 6-1 introduces the important idea of association and the scatterplot as a visual display of the association between two measurement variables. You might want to point out that when I ask for a scatterplot of A vs. B, I intend for A to be on the vertical axis and B on the horizontal. Unfortunately, I am not consistent myself with which variable I present first. For example, in this activity I present the column containing the independent variable first, but in Activity 6-5 I give the dependent variable first. Question (d) aims to remind students about the idea of a statistical tendency and to point out that association is a tendency.
Activity 6-2 gives students practice with judging the direction and strength of an association from a scatterplot. Some students may need help reading the first scatterplot. When I encounter students having trouble with this activity, I often ask them first to distinguish the positive from the negative associations and then to concentrate on the strengths. The "right" answer for the table appears later in Activity 7-1, but I tell students not to worry if they are off by one cell (say, switching moderate negative with least strong negative). Question (c) tests whether students can think more generally about direction and strength of associations. Many of the examples listed appear later.
Activity 6-3 tries to show that with paired data, one can learn much by inserting a "y=x" line on the scatterplot. Some students are confused by my calling it a 45 degree line, since the differing scales on the axes prevent it from appearing at a 45 degree angle.
Activity 6-4 shows that one can produce a labeled scatterplot to incorporate information from a categorical variable into a scatterplot.
Activity 6-5 gives students the opportunity to analyze genuine data from an important historical event in their lives. You may want to show students how to use technology to create scatterplots at this point. The moral in (c) and (d) is that one loses a great deal of information by discarding flights with no O-ring failures, for all of those flights occurred at relatively high temperatures.
With the idea of association having been introduced in the previous topic, this topic asks students explore the correlation coefficient as one numerical measure of association. Moreover, it leads students to discover the fundamental properties of correlation as they progress. The emphasis again in this topic is on understanding properties and interpretations of correlation, as opposed to becoming proficient at calculating correlations by hand.
Activity 7-1 leads students to discover the basic properties of correlation. Notice that students do not work with the calculation of correlation here; they use technology to do the calculations so that they can concentrate on correlation's properties. You might want to go over questions (b)-(e) with the class to make sure that everyone has the right ideas here; I recommend doing this only after students have thought about the questions and written their own reactions to them. Some students find the outliers in the scatterplots for classes H and I difficult to see, so you might need to point them out.
Activity 7-2 is one of my favorites and most successful. It guides students to the realization that a strong association between two variables does not imply a cause-and-effect relationship between them. Since the context is so ridiculous, almost no student has any difficulty in seeing that a causal explanation is unwarranted here. Even so, I try to visit every pair of students in the class to have them explain to me in their own words the moral of this activity.
Activity 7-3 finally asks students to consider the calculation of the correlation coefficient. I view this as much less important than understanding its properties, but I want students to see the formula nonetheless. Notice that I do much of the work for the students, asking them only to fill in a few missing z-scores and cross-products. In question (c) I want students to recognize that the strong negative association causes most positive z-scores for weight to be paired with negative ones for MPG.
Activity 7-4 is designed to give students practice judging the value of a correlation coefficient based on a scatterplot. Students seem to have a lot of fun with it as well. Some get into contests with their partners, and others prefer to work together with their partners. I like to go around the room and guess along with the students. I have written a Minitab macro called "randcorr.mtb" to generate the "pseudo-random" data. The idea is to generate data from a bivariate normal distribution where the variables have equal means and standard deviations and where the correlation coefficient rho is chosen from a uniform distribution on the interval (-1,+1). In question (b), students invariably underestimate the value of the correlation coefficient between their own guesses and the actual correlation values.
Activity 8-1 tries to get students thinking about the basic idea of using a line to summarize the relationship between two variables and the potential usefulness of that idea for making predictions. You might want to indicate that there is no "right" answer at this point for which line best summarizes the data. Many students might need some help with finding the slope and intercept of their line in (d) and (e). In question (f) I insist that students get used to using variable names rather than generic x and y symbols when writing the equation of a line.
Students cover a lot of ground in Activity 8-2, where they apply the least squares criterion to the issue of selecting a regression line to fit the data. You might want to make sure students see that I use the terms "least squares line" and "regression line" interchangeably. Notice that I do not give students the formula for slope and intercept based on the original observations; rather, I ask them to work with expressions involving the means, standard deviations, and correlation coefficient. They do this in (b) and (c), where some students may need considerable one-on-one help to work with the formulas. You'll want to show students how to use technology to find the regression line before they get to (c). This activity is another case where round-off errors can arise and confuse some students. In (d) and (e) I expect students to use the equation of the least squares line to calculate the predictions, not just to estimate predictions visually. Question (i) aims to warn students of the danger of extrapolation. Round-off errors committed by students can affect questions (j) and (k), which try to illustrate the interpretation of the slope coefficient. I intend students to answer (m) and (n) based on the regression equation but to address (o) based solely on the values given in the table; in this way students should better understand the relationship between fitted values and residuals. Questions (t)-(w) lead students to look at r-squared as the proportion of variability in air fares explained by knowing distance; many students struggle with this idea and fail to understand (w) even in light of the paragraph preceding it.
Activity 8-3 gives students the chance to perform a small-scale regression analysis on data collected in class. Question (f) is another way of looking at the perils of extrapolation.
Activity 9-1 starts off in (a)-(c) with a review of basic regression ideas that students learned in Topic 8. Question (d) breaks new ground by asking students to look at a scatterplot of residuals vs. longevities. This scatterplot reveals a "megaphone" pattern, indicating that the line better predicts gestation periods for animals with shorter longevities than for animals with greater longevities. Students should realize in (e) that the elephant is an outlier in both variables but does not have the biggest residual. Questions (f)-(h) guide students to find that the giraffe has the largest residual because its gestation period is much longer than expected, but removing the giraffe from the analysis has little effect. Questions (i)-(k) reveal that removing the elephant does have a substantial effect on the analysis, thus introducing students to the idea of an influential observation.
Activity 9-2 helps students to see that a reasonably strong correlation does not guarantee that a straight line fits the data well. Students should find a clear pattern to the regression line in (e) that indicates this as well.
Students explore the idea of data transformations, one of the more challenging mathematical ideas to appear in the book, in Activity 9-3. Some students will need help understanding how to calculate logarithms in (b). In question (d) you might want to remind students to use "log ( people per tv )" when they write out the regression equation. Students should discover a much better fit with the transformed data than with the original data.
Activity 10-1 is intended primarily to make sure that students understand the relationship between the two-way table and the raw data. I used to take for granted that they understood this, but now I think that it's an important exercise.
Activity 10-2 gives students an extended example in which to learn how to analyze two-way tables. You might want to check especially carefully that students recognize the differences among questions (i)-(k).
Activity 10-3 tests whether students can read proportions (approximately) from a segmented bar graph. Some students will need a helpful nudge to answer question (c).
Activity 10-4 leads students to discover Simpson's paradox. The hypothetical example is contrived enough that most students recognize that hospital A is the better hospital despite its lower survival rate because it treats most of those in poor condition, who are naturally less likely to survive than those in good condition. Some students take for granted the observation that those in poor condition are less likely to survive than those in good condition, but I emphasize this aspect as well as hospital A's treating most "poor condition" patients. Some students are unsure in question (f) about which hospital they would prefer, but most recognize the superiority of hospital A.
Activity 10-5 addresses a fairly subtle point. It's easy to conclude that the test is worthwhile based on the 63 of 100 cases in which a person predicted to stay actually does stay. That questions (a) and (b) produce the same answer, though, is supposed to convince students that the test prediction provides no valuable information. Many students need some prodding to understand what the segmented bar graphs requested in (e) and (f) should look like.
Activity 11-1 provides some examples of biased sampling designs; you might want to cover these questions as a class discussion. For the Literary Digest example, I try to get students to identify at least two major sources of bias: that owners of cars and phones during the Depression tended to be more wealthy and Republican and that those who take the time to write in are typically less happy with the status quo incumbent than those who decline to write in. Also in this activity I try to emphasize in no uncertain terms that appreciating the distinction between population and sample is absolutely critical to understanding statistical inference, which is the subject of much of the remainder of the course.
Activity 11-2 asks students to conduct simple random sampling using a table of random digits with the 100 members of the U.S. Senate. I don't think you can emphasize strongly enough that this is one of those rare situations in which one actually knows details of the population. I usually describe for the entire class how to read the table of random digits and then let them take their samples individually. In question (f), you might want to remind them that the population information is given earlier in the activity. Since it's impossible to find 7% women or 56% Democrats in a sample of 10, all students should answer "no" to question (f). In question (g), though, I hope that they recognize that this "no" response does not mean that the sampling method is biased in the sense of systematically favoring one group over another. To make this point more clear you might want to pool the class results (for proportions of Democrats, say) and draw a dotplot on the board. You should find, of course, that the distribution extends fairly evenly on either side of the population value (.56). In question (i) I again emphasize that it's critical for students to understand the distinction between parameter and statistic. It's also worth repeating at this point that this contrived situation is a rare one in which you know the values of population parameters.
Activity 11-3 asks students to use technology to take repeated samples of Senators and to examine the distribution of their sample results. I have written a Minitab macro called "senators.mtb" which does this by sampling without replacement from the columns containing the Senators' information. You might need to remind some students to record the sample proportion (not number) of Democrats. In question (e) students are to switch from samples of size 10 to samples of size 40. Unless they get a very unusual set of samples, they should find less variability in the sample results with the larger sample. Since this is a fundamental phenomenon that will come up again throughout the course, you might want to draw the attention of the class to this point.
Many students find that the material gets substantially tougher at this point, so you might want to warn them not to slack off and even to put forth a stronger effort. Most students enjoy sampling candies, but you might want to focus their attention on the statistical principles being illustrated here.
You need to bring Reese's Pieces candies to class for this topic. These candies come in three colors: orange, brown, and yellow. Two or three one-pound bags suffice for a class of 24 or so.
Activity 12-1 leads students through candy sampling, which should be done together since question (g) calls for the pooling of class data. Questions (b)-(e) are critical to understanding what statistical inference is all about, so I suggest that you make those answers exceedingly clear to students. For question (g) I go around the room and ask students to report their sample proportion of oranges while I create the dotplot on the board.
Activity 12-2 moves to using technology to simulate the same process. I have written a Minitab macro called "reeses.mtb" to do this, but it amounts to little more than sampling from a binomial distribution. I don't think you can emphasize enough to students that we have to assume a certain value for the population proportion in order to make the computer run the simulation. In question (b) I expect students to see a roughly mound-shaped distribution (I haven't introduced the term "normal" yet) that is centered around the actual population proportion of .45. Question (f) is key to the notion of confidence; many students struggle to see that the answer to (f) is the same as the middle percentage in the table of (e). Questions (h)-(l) investigate the effect of sample size. In question (i) be prepared to warn students that the scale on the display has probably changed but that the distribution is indeed less spread out than before. Students should discover in (j)-(l) that the larger sample size produces more samples with proportions close to the population value. Questions (m)-(o) specifically introduce the idea of 95% confidence. Questions (p) and (q) ask students to verify that the familiar expression for the standard deviation of a sample proportion is reasonable, based on its closeness to their simulated findings. Some students tend to become enamored of this standard deviation formula as if it specifies the entire sampling distribution; I try to remind them that the shape and center of the sampling distribution are just as noteworthy.
You should bring dice (ordinary, six-sided dice or number cubes) to class for this topic, preferably one die for each student.
Activity 13-1 follows in the tradition of conducting physical simulations before computer ones. This should be done together as a class since question (i) calls for pooling of the data. It's critical to help students realize that we are fixing the simulation so that one-third is the rate of defective items coming off the assembly line. Questions (a) and (b) reinforce the crucial distinction between parameter and statistic, while (c) and (d) remind students of the phenomenon of sampling variability. For question (i) I again go around the room and ask students to report their sample proportions of defectives while I create the dotplot on the board. Question (k) introduces the fundamental idea of significance. With the technology simulation in (l), some students will need to be reminded about how to read and construct a table of tallies. I have written a Minitab macro called "widgets.mtb" to do this simulation, but it entails little more than sampling from a binomial distribution. Questions (n) and (o) then return to the crux of the concept of significance.
Activity 13-2 asks more questions about the computer simulation performed in Activity 13-1. Students should note that finding two or fewer defectives would be unusual if the process were still producing one-third defective and that finding no defectives would be extremely unusual if the process were still producing one-third defective. This is a point at which students might need to be reminded that the one-third is the population proportion and the 2/15 or 0/15 the sample proportion.
Activity 13-3 presents simulation results to students and asks them to interpret the results in a significance context. Some students still have trouble understanding what the counts of the histogram represent, so be ready to tell them that one test had 0 correct identifications, fifteen tests had 1 correct identification, and so on. Question (a) might be perceived as a "trick" question since it's impossible to identify exactly 25% of 30 cards correctly (so the answer is 0), but its aim is to sharpen the distinction between parameter and statistic in students' minds. Questions (c)-(e) try to help students connect the notion of "how surprising" a sample result is with how often it would occur in the long run. Question (f) then sees if they can put those ideas together and recognize that since it would be very unlikely to get 17 or more right by sheer guessing, such a result constitutes pretty strong evidence of special ability.
This topic lends itself to working through with the class as a whole, at least through the basics of reading the table and standardizing. You will find that Table I lists areas under the standard normal curve to the left of the tabulated values.
Activity 14-1 tries to convince students of the importance of normal distributions by reminding them that they have encountered mound-shaped distributions both with real data and with simulated sampling distributions. In (a) I expect students to comment on the mound shapes and symmetry of the distributions. Question (c) aims to illustrate how to read the mean and standard deviation of a normal curve from its sketch. This activity also introduces the term "probability" in a loose sense.
Activity 14-2 guides students through the necessary skills of reading the standard normal table. I insist that they shade in areas under the curve as well as reading the table here. Question (a) involves nothing more than looking up the tabled value. Question (b) gives the same answer, showing that one can ignore strict vs. non-strict inequalities with normal distributions. You might want to caution students not to ignore this distinction in other settings, however. In (c) students can either subtract the tabled value for 0.68 from 1 or take advantage of symmetry by looking up -0.68. Question (d) is straightforward, but (e) is trickier. Some will be tempted to subtract (d) from (c) rather than from (a) as they answer (e). Question (f) illustrates the limitation of the table; I want students to answer "less than .0002" here. Reading the table "in reverse" is the object of (g) and (h); again I insist on students' shading in areas under the curve to help them see what's going on here.
Activity 14-3 introduces standardization as the technique for using the standard normal table to find areas with generic normal distributions. You might remind students that they already encountered standardization and z-scores back in Topic 4 when they compared SAT and ACT scores. One point that students often miss is that standardization simply rescales but leaves the shape of the curve (and the area of the shaded region) unchanged. I try (admittedly with little success) to get students to use proper notation for these calculations, but they too easily lapse into sloppy, incomplete notation. Question (e) is where I let the students begin to work on their own or in pairs. The wording of question (g) confuses some, and many find (h) quite challenging since it is their first exposure to reading the table "in reverse" in conjunction with standardization.
Activity 15-1 introduces students to calculations based on the CLT and tries to convince them that the results nicely match their earlier findings from simulations. In question (a) many students focus solely on the formula for standard deviation, ignoring the reminder to consider the shape and center of the distribution as well. You may notice that I do not use the continuity correction in these calculations: I fear that the details of that procedure would divert students' attention from the larger issues. I try to insist on students' using proper notation in (c) and (d); their answers to (d) and (e) should at least be in the ballpark of each other.
Activity 15-2 asks students to perform CLT calculations with an eye toward significance. In (a) they again must remember to comment on shape and center as well as variability; then they should come up with a reasonable sketch in (b). I encourage students to complete the guess in (c) as a check on their future work. The answers to (d) and (e) should agree remarkably well, but that does not guarantee that students will see the point of (f): that getting 38% or more correct is reasonably but not overwhelmingly surprising if the subject is just guessing.
Activity 15-3 parallels Activity 15-1 but with a larger sample size. Questions (d) and (e) are meant to refute students' inclinations to argue that a larger sample size always produces smaller probabilities or always produces larger probabilities. The relevant question is, of course, "probability of what?", so students find in (d) that the larger sample size produces a smaller probability and in (e) that the larger sample size produces a larger probability. This frustrates some students, but I urge them to start with the fact that larger sample sizes produce less variability and then reason from there.
It's a good idea to bring pennies to class for the spinning experiment in the Preliminaries; I've heard that newer pennies spin more consistently than older ones. I check to make sure that students understand that I'm asking for an interval of values in questions 5 and 6. You might also want to draw students' attention to question 12, which asks them to collect data on their peers outside of class.
Activity 16-1 simply introduces the goal of confidence intervals and reminds students yet again of parameters and statistics. It then presents students with the terminology and formula for confidence intervals, points that you probably want to cover with the class as a whole. You might need to remind students again about "plus/minus" notation and what it means.
Activity 16-2 then leads students to find critical values from the standard normal table. I point out that for commonly used confidence levels, the book provides the critical values. Thus, I tell students that they only have to find the critical values from scratch if they are working with an uncommon confidence level.
Activity 16-3 asks students to construct a confidence interval for the first time. Some will need help with the mechanics of doing this in (a). In (e) I have in mind looking at the "plus/minus" term from (a); this point probably warrants mentioning to the whole class.
Activity 16-4 uses computer simulations to illuminate the meaning of confidence intervals. Again it's worth emphasizing that for the purposes of simulation, one has to assume a certain value for the population proportion that would never actually be known in practice. I have written a Minitab macro called "confsim.mtb" to perform the simulation; it relies on generating random data from a binomial distribution. Students' answers to (b) should naturally be in the vicinity of 95%. The point of (c) is to help students see that the occasions for which the interval fails to contain the parameter are precisely those occasions when the sample statistic falls far from the parameter value. The answer to (d) being "no", students are to explain in (e) that the procedure generates an interval containing the population parameter 95% of the time in the long run. I encourage students to read carefully the comments on interpreting confidence intervals which follow this activity.
Activity 16-5 leads students to explore the effect of confidence level on a confidence interval. I ask them to use technology to produce the intervals in order to speed the process along and free them to concentrate on the effect. I have written a Minitab macro called "propinf.mtb" to facilitate this process since Minitab has no built-in command for testing proportions. Students should find that the intervals widen as the confidence level increases. I try to convince skeptical students that this result makes sense, for to be more confident one must allow more room for error.
Activity 16-6 parallels Activity 16-5 by guiding students to explore the effect of sample size on the confidence interval. Students should find that increasing the sample size causes the confidence intervals to become narrower. Questions (d) and (e) address the rate at which this narrowing occurs, for quadrupling the sample size cuts the half-width in half. Some students may miss this point at first due to rounding discrepancies.
The first question of the Preliminaries refers to the information that students were to have collected on their peers outside of class.
Activity 17-1 begins by asking students to calculate a confidence interval by hand. While this should be familiar at this point, some students still take quite a while with this. In (b) I want students to mention the assumption of random sampling, but they should comment on their own that the sample size is large enough. Many students need help with the subtle points about interpreting confidence intervals in (e), where the respective answers are False, False, True, and False. In (g) many students calculate the interval itself without realizing that "margin-of-error" just refers to the half-width. Questions (i) and (j) try to make the point that the margin-of-error increases for subgroups since they necessarily have a smaller sample size.
Activity 17-2 introduces the idea of determining a sample size before conducting the study. Students make three common mistakes in (a): they try to use the entire confidence interval formula and not just the half-width piece, they make algebraic errors in manipulating the expression, and they permit round-off errors in intermediate calculations to affect the final result. In (b) and (d) I try to stress the intuitive nature of these questions: to be more accurate you need a larger sample and to be more confident you need a larger sample. I advocate resisting the temptation to do the algebra for the students and give them the formula for determining the sample size; I like to think that students can figure that out from understanding the expression for confidence intervals in the first place. I insist that students round up to produce a whole number for their answers here, since one can not interview a fraction of a person. Question (f) tries to illustrate that the size of the population (as opposed to the size of the sample) plays no role. In question (h) I look for students to respond that one must interview the entire population to achieve perfect accuracy with 100% confidence.
Activity 17-3 is another of my favorites in that it uses a ridiculous context to make an important point. Students should calculate the confidence interval in (a) but recognize immediately in (b) that the interval is ludicrous for estimating the proportion of females among all humans. In (c) watch for students who argue that this is just one of those 5% of intervals that do not contain the parameter; the problem here involves the horribly biased method of data collection. Even in question (d) the interval is not sensible, for we know with certainty the proportion of women in the population of 1994 U.S. Senators.
Activity 17-4 addresses the issue of to what population sample results can be generalized. In most cases class results concerning credit card ownership probably do not generalize very far.
Activity 18-1 introduces the reasoning and structure of tests of significance. Questions (a)-(e) are straightforward for most students, but (f) requires that they remember the Central Limit Theorem from Topic 15. A key point to stress in (f)-(i) is that these calculations all assume that the subject is just guessing. Questions (g)-(i) in particular try to help students make the connection between how often a sample result would occur by chance in the long run and how convinced one would be that the subject did better than sheer guessing. Following the activity is a lay-out of the structure of significance tests. I certainly recommend that you present a "mini-lecture" at this point explaining these various components. I try to convince students that if they really understand what a p-value means, then they understand the essentials of the testing process. You might want to caution students that the p-value is quite different than p-hat.
Activity 18-2 leads students step-by-step through the testing process. In (a) and (b) I stress the importance of writing the hypotheses in words as well as symbols so that students genuinely understand what they are testing. Most students balk at (e), so be prepared to describe for the class what the p-value means in this context. If students struggle with putting their conclusion into words in (f), you might also want to demonstrate a well-written conclusion. Questions (g)-(i) are relatively straightforward; you might want to note for the students that "rejecting the null hypothesis" at a certain level is virtually synonymous with the "sample result being statistically significant" at that level.
Activity 18-3 asks students to investigate the effect of sample size on tests of significance. They should discover that a sample proportion of 54% may or may not be statistically significantly greater than one-half. Having students use technology to perform the test facilitates this investigation; the Minitab macro "propinf.mtb" is helpful here. For larger sample sizes this result is statistically significant, while for smaller sample sizes it is not. Question (a) is yet another reminder of the importance of keeping parameters and statistics straight in one's mind. Question (b) is critical, for students can not understand the larger point about sample sizes without realizing what hypotheses the test is assessing. I propose using technology to do these calculations so that students can concentrate on the sample size issue.
Activity 19-1 asks students to examine two-sided tests. Students should find the same p-value in (d) and (e) due to the symmetry of the sample results in those cases. In (h) they should find the one-sided p-value to be half of the two-sided p-value, and in (i) the one-sided p-value should be one minus the other one-sided p-value. Round-off errors may obscure these points for students, however. My intention in (j) is for students to recognize that the test is not necessary if the sample data clearly do not support the alternative hypothesis at all. The table asked for in (k) should summarize the relationships among these one- and two-sided p-values.
Activity 19-2 aims to lead students to see the connection between tests and intervals. Students should find that if the 95% confidence interval contains a certain value, then a two-sided test involving that value will fail to reject the null hypothesis at the .05 level. On the other hand, values not contained in the confidence interval are rejected by the test.
Activity 19-3 tries to help students see that fixed significance levels should not be treated as sacred. Questions (a) and (b) have very similar sample results but have p-values on opposite sides of .05. Questions (b) and (c) have very different sample results but have p-values on the same side of .05. Many students obtain the correct results but do not trust themselves enough to answer (d) and (e) correctly. The moral is supposed to be that p-values are much more informative than simple statements of "significance" or not.
Activity 19-4 guides students to discover the important distinction between practical and statistical significance. The p-value of the test in (c) establishes that the sample proportion differs statistically significantly from 30%, but the confidence interval in (d) reveals that the difference is extremely modest in practical terms. I try to convince students that the test and interval are in complete agreement here: as in Activity 19-2 the test rejects the value which is not in the confidence interval. More importantly, though, the confidence interval reveals more information than does the test of significance.
Activity 19-5 constitutes yet another reminder to students that statistical inference procedures all depend on good sampling methods in order to have any validity at all. Both the test and the interval give extremely misleading information here because the sampling method was so biased against Roosevelt.
The Preliminaries provide a convenient point at which to remind students that they should be serious and thoughtful in addressing these questions but that they are not expected to be able to answer them as knowledgeably as if they had already studied the topic.
Activity 20-1 introduces some terminology related to experimental design (explanatory and response variables, confounding variables), emphasizing the fundamental principle of control. It also aims to help students see the need for a comparison group as a primary way to establish some control in an experiment.
Activity 20-2 tries to make the point that use of a comparison group does not guarantee a well-designed study. It hopes to lead students to see randomization as a second way to achieve control in an experiment.
Activity 20-3 stresses the role played by blindness, both single- and double-, in designing a good experiment.
Activity 20-4 asks students to pull together the principles of comparison, randomization, and blindness by writing about how to implement all three in an experimental design. If you have been leading a class discussion of the earlier activities, I suggest stopping that at this point to let students write about this experiment.
Activity 20-5 wants to show students that observational studies are sometimes the only options available to researchers and that they come in many types and can provide valuable information. Again I recommend having students write about question (a) before you discuss it as a class.
Activity 20-6 asks students to describe in detail how they might design an experiment to address a particular (silly) question. I like to encourage students to go into a great deal of detail here to show them the difficulty of establishing a sound protocol for gathering experimental data.
Students often have a great deal of trouble seeing the connection between the results of the simulation and the conclusion of significance test. They fail to see the complimentary nature of the two analyses and instead regard them as different and unrelated things. I suggest that you try to drive home the point about this connection by using the simulation to make clear what the p-value of the test actually means.
You will need to bring cards to class to use for the simulation. I use index cards that I have pre-marked with "D" and "R" for deaths and recoveries, respectively. You might also use ordinary playing cards with black cards representing deaths and red cards recoveries.
Activity 21-1 presents a simulation analysis to determine if two sample proportions differ "significantly". While students perform the simulation called for in (b), you might want to emphasize just what the simulation is doing: investigating how often one would get 7 or more recoveries with the new treatment even if the new treatment were no better than the old (which is what assuming that 12 would recover regardless of treatment group entails). You should also reiterate the point that if the observed result is unlikely to occur by chance alone, that indicates strong evidence that the new treatment is genuinely superior to the old. For the technology-generated simulation in (h), I have written a Minitab macro called "twopropsim.mtb". Question (j) is a good one for separating those students who understand the idea of significance from those who do not. A result as extreme as the one in (j) would happen rarely by chance, so it constitutes strong evidence of the superiority of the new treatment.
Activity 21-2 asks students to perform a formal test of significance on experimental data that they have encountered before. This provides another good opportunity to emphasize the distinction between parameter and statistic and to review the notational differences between the two. In question (c) some students will be tempted to average the two sample proportions rather than to compute the combined sample proportion; while this distinction is negligible here it can be important when the sample sizes differ markedly. Many students are baffled by question (f), which merely asks them to report the p-value of the test and recognize what it means.
Activity 21-3 again has students investigate the effects of sample sizes on test of significance. Students are to discover that the distinction between 60% and 70% success rates becomes more and more significant as the sample sizes increase. In (b) and (c) I have in mind that sample sizes of 10 or so in each group would not be very conclusive, while sample sizes of 1000 or so in each group would. While students could perform these tests by hand, I strongly recommend the use of technology to free them to concentrate on the principle involving sample sizes at work here. I have written a Minitab macro called "twopropinf.mtb" which performs these tests and also constructs confidence intervals.
In Activity 22-1, question (a) is straightforward for most students, but (b) requires some thinking about the meaning of an entirely negative interval, an entirely positive interval, and an interval containing zero. The key, I think, is to stress what the interval estimates: the difference in population proportions. In (d) students should recognize that this interval is the negative of the interval from (a).
Activity 22-2 aims both to give students practice with the significance test and confidence interval and to remind them that the design of a study determines to a large extent the nature of the conclusion one can draw from it.
Activity 22-3 is another of my favorites in that it forces students to reconsider a lesson learned much earlier in the course. The significance test in (a) reveals a highly significant difference in sample proportions of acceptance, but the explanation is not discrimination but rather Simpson's paradox. For whatever reason, men tended to apply to the easier programs to get into, and women tended to apply to the tougher programs. This activity again makes the point that the design of a study (in this case an observational study) is a very important consideration when interpreting data.
The data collection in the Preliminaries raises some interesting issues. Again I favor collecting the data anonymously for those students who consider their sleeping habits private. Converting from bedtimes and waketimes to sleeping times in minutes is a challenging exercise for some. You might want to discuss how to do that with technology, but I usually find it easier just to have the students do the calculations by hand or in their heads.
Activity 23-1 reviews once again the distinction between parameters and statistics. Of special importance here is re-acquainting students with measurement variables, for it has been quite a while since they have studied them in detail. Reviewing the notation for means and standard deviations is also a goal here. You might want to stress how these parameters and statistics differ from their binary counterparts studied in previous units.
Activity 23-2 tries to lead students to recognize the features of a distribution that play a role in making inferences about a population mean. It also hopes to tap into their intuition about how those features work. Many students struggle with these ideas at first, so you might want to hold an early class discussion once students have had a chance to consider the questions. I have in mind that sample 2 (with its smaller sample size and larger variability) would do the worst job of estimating the population mean and sample 3 (with its larger sample size and smaller variability) would do the best job. In (k) students should recognize that the sample size and sample variability (as measured by the sample standard deviation) play a role in making inferences about a population mean; they should also mention the more obvious but easily forgotten sample mean.
I recommend describing the t-distribution to the class and illustrating the use of the t-table before asking students to work on Activity 23-3. This activity gives students practice reading critical values from the t-table, a skill that they need in constructing confidence intervals.
Similarly, Activity 23-4 provides students with practice using the t-table to find p-values necessary in tests of significance. In questions (b)-(h) I intend for students to express their answers as a range of values, such as "between .025 and .01". You might want to show students how to use technology to calculate these p-values more exactly.
Activity 23-5 brings students back to the data of Activity 23-2, now using technology to construct the confidence intervals and perform the tests of significance. The primary goal here is to reinforce (or rectify, as appropriate) their intuitive hunches expressed in Activity 23-2. You will need to show students how to perform these t-procedures with their technology; most software packages have built-in commands for these tests and intervals.
Activity 23-6 finally asks students to carry out these inference procedures by hand with real data. With question (a) I want to remind students not to forget to do exploratory analyses of data before proceeding to inferential analyses. Question (b) reminds students that they can use technology to assist with calculations, even though I go on to ask them to carry out the inference procedures by hand in (c) and (d). Questions (e)-(g) ask students to consider effects of sample size, sample standard deviation, and sample mean; in (g) some students fail to realize that the interval would change (shifting down)even though its width would remain the same. Question (i) tries to remind students of the importance of random sampling to ensure the validity of inference results.
Activity 24-1 gives students more practice constructing a t-interval by hand, after using technology to calculate the summary statistics. Questions (c) and (d) make the important point that the interval estimates the population mean and not individual values. You might want to make this point especially clear to students who might gloss over it. I consider this point to be much more fundamental to understanding statistical inference than a more technical one such as the difference between "confidence" and "probability". Questions (e) and (f) lead students to discover that the sample size must quadruple to cut the half-width in half. Question (g) reminds students to question the method of data collection, and (h) points out that one can examine the distribution of sample data as a visual check on the assumption of population normality.
Activity 24-2 uses very contrived hypothetical data to make a specific point. The data is rigged so that despite varying distributions of withdrawal amounts, each machine has exactly the same sample size, sample mean, and sample standard deviation. They necessarily produce identical confidence intervals for the population mean. The moral is that the mean summarizes only one aspect of a distribution and that students should not forget to perform exploratory analyses of data.
Activity 24-3 introduces students to the paired comparisons design, asking them to analyze the differences in marriage ages for the couples. By forcing students to calculate the differences by hand, question (a) tries to make students appreciate the paired nature of the data. The remaining questions are fairly straightforward, although (e) does remind students that whether or not the interval contains zero is an important consideration when comparing two groups.
Activity 24-4 provides still one more reminder to students to think about the inference process before blindly applying inference procedures. These data do not constitute a sample from a population, so the confidence interval does not estimate any meaningful parameter.
Activity 25-1 tries to develop students' intuitions about the roles played by sample sizes, sample means, and sample standard deviations in comparing two means. I recommend going over the notation and the details of the interval and test with the class as a whole. You'll need to show students how to use the technology to perform the two-sample t-procedures. I suggest letting students tackle questions (f)-(j) on their own (again, I mean collaboratively) before discussing them as a group. I intend for students to spot that Barb's sample means differ more than Alex's, that Carl's times are less variable than Alex's, and that Donna has larger samples than Alex. Students should explain in (j) that these factors account for the significant differences found the three commuters other than Alex.
Activity 25-2 gives students the chance to carry out an analysis from scratch. It seeks to lead them through the steps of creating visual displays, then calculating summary statistics, then proceeding to conduct a t-test, and finally complementing that with a confidence interval.
Activity 25-3 not only gives students another opportunity to analyze real data but also reminds them once more to consider the data collection design before drawing conclusions.