Workshop Statistics: Discovery with Data and Fathom
Topic 18: Central Limit Theorem
Activity 18-1: Smoking Rates
(a) q
(b) Not necessarily due to sampling variabilty.
(c) The CLT says that this sample proportion should have a normal distribution,
with mean equal to q=.229, and standard deviation
equal to sqrt(q(1-q)/n)
= sqrt(.229*.771/100) = .042.
(d) Shade .25 and above. Guesses will vary from student to student.
(e) mean of sampling distribution: .229; standard deviation of
sampling distribution: .042; z-score standardizing .25 = (.25-.229)/.042
= .5
(f) probability above
.5 = 1-.6915= .3085
(g) The sampling distribution based on a larger sample size will still
be centered at .25, but will exhibit less variability: sqrt(.229(1-.229)/400)=.021.
We expect these
values to fall closer to .229, so the probability of a sample result above
.25 would be smaller.
(h) z=(.25-.229)/.021 = 1.00, proportion above = 1-.8413 = .1587
(i) Variabilty will decrease and a sample proportion above .25 will
be even less likely. z=(.25-.229)/sqrt(.229(1-.229)/1600) = 2, proportion
above = 1-.9772= .0228
(j) no
(k) There would be no changes.
Activity 18-2: Smoking Rates (cont.)
(a) SD=sqrt(.142(1-.142)/100)=.035
<-----------------
values ----------------->
(b) mean=.142, sd=.035
z=(.25-.142)/.035 = 3.09
proportion above = .001
So P(>.25) =
.001
(c) The standard deviation will decrease, which increases the z-score,
which decreases the probability that the sample proportion would exceed
.25.
(d) We would have strong reason to doubt that the state was Utah because
there is such a small probability (.001) of there being any more than 25
smokers in a sample of 100 Utah residents that it would be hard to believe
that a random sample of 100 Utah residents would yield 25 smokers.
Activity 18-3: Candy Bar Weights (cont.)
(a) Want P(2.18 < <
2.22) where xbar follows a normal distribution with mean m=2.20
and standard devaition s=.04
Z(2.18) = (2.18-2.20)/.04 = -.5
proportion below -.5 = .3085
Z(2.22) = (2.22-2.20)/.04 = .5
proportion below .5 = .6915
Subtracting to find the area between = .6915-.3085 = .3830
(b) These sample means will have a normal distribution, centered at
m=2.20,
but now with standard deviation s/sqrt(5)=.04/sqrt(5)
=.018
(c) shading
(d) The z-scores are (2.18-2.20)/.018 = -1.11 and (2.22-2.20)/.018
= 1.11, so the probability that the average weight of 5 candy bars will
be between 2.18 and 2.22 ounces is .8665-.1335 = .7330.
(e) The probability will increase if the sample size were 40 instead
of 5 because the standard deviation will decrease and the values will be
more concentrated around 2.20, so there will be a greater concentration
of sample mean values in this middle range between 2.18 and 2.22.
(f) The standard deviation of the sample means is now sigma/sqrt(n)
= .04/sqrt(40) = .0063. The z-scores are (2.18-2.20)/.0063 = -3.17
and (2.22-2.20)/.0063 = 3.17, so the probability that the average weight
of 40 candy bars will be between 2.18 and 2.22 ounces is .9992-.0008 =
.9984
(g) The calculations in (f) would remain approximately correct even
if the candy bar weights themselves had a skewed, nonnormal distribution
since the Central Limit Theorem establishes that the distribution of sample
means will be approximately normal distribution for a sample size as large
as n = 40. The normal approximation would not be valid with
n
= 1 or n = 4.
Activity 18-4: Candy Bar Weights (cont.)
(a) SD()=.04/sqrt(60)=.00516
Z(2.19)=(2.19-2.20)/.00516 = -1.94, proportion below -1.94 = .0262
Z(2.21) =1.94, proportion below = .9738
probability of an
value beween 2.19 and 2.21 = .9476
<-------------
values ------------------->
(b) The z-scores are the same as in (a), so the probability remains
.9476.
<-------------
values ------------------->
(c) They are equal.
(d) .9476, becuase the difference between the observation and the population
mean is still + .01, and the standard deviation does not change.
(e) There is a very high probability (.9476) that a sample of size
60 would result in a sample mean weight within + .01 of the actual
population mean.
Activity 18-5: Solitaire (cont.)
(a) std dev() =
sqrt(1/9(8/9)/10) = .099
z = (.10-1/9)/.099 = -.1122
proportion below = .4562
P(<.10) =
.4562
(b) .3079+.3849 = .6928 (Note: the probability of zero wins is .3079,
not .0379 as appeared in some printings of the book.)
(c) These are not at all close.
(d) The CLT provides a poor approximation for each probability in this
situation because the technical conditions concerning n and q
needed for the validity of the CLT are not met. n*q
is only 1.11, which is not greater than or equal to 10. n(1-q)
is only 8.88, which is not greater than or equal to 10.