Workshop Statistics: Discovery with Data, Second
Edition
Topic 23: More Inference Considerations
Activity 23-1: Racquet Spinning (cont.)
(a) (.362, .558)
(b) yes
(c) Based on our answer to (b), we would not expect a significance
test of whether theta differs from .5 to be significant at the .05 level
because the interval indicates that .5 is a plausible value for q.
(d)-(h)
Hypothesized value
Contained in 95% CI?
Test statistic
p-value
Significant at .05?
.35
no
2.31
.021
yes
.40
yes
1.22
.221
no
.50
yes
-.80
.424
no
.55
yes
-1.81
.070
no
.60
no
-2.86
.004
yes
(i) If the hypothesized value is contained in the 95% confidence interval,
it will not be significant at the .05 level, and vice versa.
Activity 23-2: Racquet Spinning (cont.)
(a) sample proportion: .565; test statistic: 1.84; p-value:
.066; significant at .05?: no
(b) sample proportion: .575; test statistic: 2.12; p-value:
.034; significant at .05?: yes
(c) sample proportion: .650; test statistic: 4.24; p-value:
.000; significant at .05?: yes
(d) a and b
(e) b and c
Activity 23-3: Cat Households (cont.)
(a) 99.9% confidence interval: (.268, .278); Hypotheses: Ho:
q
= .25, Ha: q > .25; test
statistic: 15.02; p-value: less than .0002 (.0000 to many decimal
places).
(b) Yes, because the entire 99.9% confidence interval is greater than
.25, and the test of significance reveals very strong evidence against
the null hypothesis that theta equals .25.
(c) No, it is most likely only 1-2% more than 25% since the 99.9% confidence
interval is only (.268, .278).
(d) confidence interval
Activity 23-4: Hypothetical Baseball Improvements
Students' answers to (a)-(g) may differ. These
are meant to be sample answers.
(a)
<---------------------------------
number of hits ------------------------------->
This distribution is symmetrical,
with the center at 7. The spread is from 1 to 17.
(b) About 13.
(c)
<---------------------------------
number of hits ------------------------------->
This distribution is also symmetrical,
with the center at 10. The spread is from 2 to 18. There is
quite a bit of overlap between the two distributions.
(d) Maybe 1/5 (actual power=.166 here).
(e) No. Even though he is actually a .333 hitter is only
about a 30% chance that he will get enough hits (>13) to convince us that
he's not a .250 hitter.
(f) Approximate power = .20
Note, there is a typo in (g) for Minitab version:
hypothesized p should be .25. The answer to (g) is .2923 for n=30
and .5987 for n=100. The answers to (g) - (j) below are the answers
to (h)-(k) in the Minitab version.
(g)
There is less overlap between the two distributions. From the first
distribution, he'd be in the top 5% of performances if he got more than
35 hits or so. When he is a .333 hitter, he will get more than 35 hits
almost 50% of the time. Thus, there is a much higher chance that he will
be able to convince us he is better than a .250 hitter. Increasing the
sample size gives us more evidence and increases the power we have to detect
that his performance has improved.
(h) more powerful (the distribution will now be centered higher and
have less overlap with .250 distribution), he's a much better hitter so
it's much easier for him to perform convincingly higher than a .250 hitter.
(i) more powerful. He doesn't need to perform as high to convince us
he has improved.
(j) alternative value, level of significance
Activity 23-5: Halloween Practices (cont.)
(a)1.96sqrt(.69(.31)/1005)= .0286
(b) We would need a larger sample because larger samples make the statistic
more accurate.
(c) Solve 1.96sqrt(.69(.31)/n)=.01 for n to find that n=8,218
(d) We would need even more people because increasing the confidence
without changing the margin or error would require a larger sample size.
(e) Solve 2.576sqrt(.69(.31)/1005) to find n=14,194
(f) The population size did not enter into these calculations at all.
The answers to (c) and (e) would be no different if the population of interest
were all California adults rather than all American adults.
(g) Every person in the population would have to be interviewed to
determine the value of the population proportion exactly, with 100% confidence.
These distributions are
very different from each other. While many of their descriptive statistics,
such as sample size, mean, and standard deviation are the same, the distributions
still differ greatly.
Activity 23-7: Female Senators (cont.)
(a)( .034, .146)
(b) no
(c) The technical conditions necessary for this procedure to be valid
are not fully met. This sample is not a simple random sample from
the population of interest. The male/female ratio in the 1999 U.S.
Senate is not representative of all humans.
(d) The interval does not make sense for this purpose because we know
the population proportion of women to be .09 for the 1999 U.S. Senate.