Stat 301 – Final Exam Review Questions
1) Recall from the Exam 2
review problems the weights of 30 (fun-size) Mounds candy bars
and 20 (fun-size) PayDay candy bars,
in grams. Suppose we consider these to be independent random samples from these
two brands.
(d) State null and
alternative hypotheses for comparing the mean weights of these two population,
both in symbols and in words.
(e) Do you think a
theory-based analysis would be appropriate for these data? Explain
how you are deciding.
Below are the results of
1000 random shuffles of these 50 weights into two groups.
(f) Is this distribution
approximately normal? Would you have expected this? Explain.
(g) Would you expect
this distribution to follow a t distribution? Explain.
(h) Use the above output
to roughly roughly approximate the p-value. Explain
how.
(i) Explain a “difficulty”
with using this simulation approach to analyze these data.
(j) Assuming it’s valid,
how would you interpret this confidence interval
2) A study examined
whether a nicotine lozenge can help a smoker to quit. The research article
reports on many background variables, such as age, weight, gender, number of
cigarettes smoked, and whether the person made a previous attempt to quit
smoking (Shiffman et al., 2002). Suppose the
researchers want to compare the distributions of the background variables
between the two treatment groups (nicotine lozenge or placebo lozenge).
(a) For each of the five
variables listed, indicate whether it calls for a comparison of means or a
comparison of proportions.
(b) Would the researchers
hope to reject the null hypotheses or fail to reject the null hypotheses in
these tests? Explain.
(c) Of the 459
nicotine users, 46.0% successfully abstained (didn’t start smoking again) for 6
weeks, compared to 29.7% of the 458 control group (without nicotine). Calculate
and interpret a 95% confidence interval.
(d) Are you willing to draw a cause-and-effect conclusion from
this study? If not, suggest a possible confounding variable and explain how it
is confounding in this study.
(e) Are you willing to generalize these results to all smokers
interested in quitting? If not, suggest a possible source of sampling bias and
the likely direction of the bias.
3) Researchers examined
the long-term survival of doctors graduating from one medical school over one
century (Redelmeier and Kwong,
2004), comparing those who were presidents of their class to those who appeared
alphabetically before or alphabetically after the president in the graduating
class photograph. Statistics on
long-term mortality were obtained from licensing authorities, medical
obituaries, professional associations, alumni records, and national physician
directories (follow-up 94%). They
reported on 507 presidents and 1014 classmates.
(a) Is it reasonable to
treat the presidents and non-presidents as independent random samples?
Assuming
the answer to (a) is yes:
(b) The
researchers examined several base-line variables, including gender and whether
or not the individual wore glasses. They
found 93% of the presidents were male, compared to 85% of their classmates. They also found 9% of presidents were
glasses, compare to 12% of their classmates.
Are either of these differences statistically significant?
(c) The overall-life expectancy for the presidents was 49.0
years compared to 51.4 years for their classmates. The two-sided p-value was reported to be
.036. Assuming the sample standard
deviations were similar in the two samples, use trial-and-error in JMP, R, or
TOS applet or algebra to approximate the value of this standard deviation. What conclusion would you draw from this
p-value?
4) A poll conducted March
6–8, 2004, by The Wall Street Journal/NBC
News asked 1,018 respondents their opinions about gay marriage. When asked
to state whether they would favor or oppose “a constitutional amendment making
it illegal for gay couples to marry,” 43% responded in favor and 52% opposed
(5% were unsure). When asked whether
they would favor or oppose “a constitutional amendment that defined marriage as
a union between a man and a woman and made same-sex marriages
unconstitutional,” 54% favored the amendment, 42% opposed (1% said it depends,
and 3% were not sure). Would it be valid
to do a two-sample z-test with these
data? Explain.
5) In a study reported in the July 6,
2007 issue of the journal Science, researchers studied 396 American
college students and kept track of each student’s sex and also how many words
they spoke in a day. They found that females spoke an average of 16,215 words
per day and males an average of 15,669 words per day.
Consider the following variables:
For each research question below, which theory-based method would
you consider:
·
One-proportion z-test or
interval
·
One-mean t-test or
interval
·
Two-proportion z-test or
interval
·
Two-mean t-test or
interval
Briefly justify your answer.
(a) Do women tend to use more words than men?
(b)
How often does the proportion of adjectives a person uses in a day exceed 0.25?
In other words, estimate the probability more than 25% of the words someone
uses in a day are adjectives.
(c) Are women more likely than men to use more than 15,000 words
per day?
(d) Do people tend to talk more (use more words) on the weekends
or on the weekdays?
6)The Roller
Coaster Database maintains a web site (www.rcdb.com) with data
on roller coasters around the world. Some of the data recorded
include whether the coaster is made of wood or steel and the maximum speed
achieved by the coaster, in miles per hour. The boxplots display the
distributions of speed by type of coaster for 145 coasters in the United
States as of Nov. 2003.
(a) Do these boxplots allow you to determine whether there are
more wooden or steel roller coasters?
(b) Do these boxplots allow you to say which type has a higher
percentage of coasters that go faster than 60mph? Explain and, if
so, answer the question.
(c) Do these boxplots allow you to say which type has a higher
percentage of coasters that go faster than 50mph? Explain and, if
so, answer the question.
(d) Do these boxplots allow you to say which type has a higher
percentage of coasters that go faster than 48mph? Explain and, if
so, answer the question.
(e) The steel coasters have a “high outlier.” Explain how I know
this from the above display and interpret this outlier in context. What would
be your next step in analyzing these data?
(f) Conjecture as to how the mean, median, interquartile range,
and standard deviation will change (if at all) if that coaster identified in
part (e) (Top Thrill Dragster in Cedar Point Amusement Park, Sandusky, Ohio) is
removed from the data set. Explain your reasoning.