Workshop Statistics: Discovery with Data and Fathom
Topic 5: Measures of Spread
Activity 5-1: City Temperatures
(a) Raleigh median: 59.5 degrees
SF median: 57 degrees
Pretty close to each other
(b) No, also need to consider the variability in the distributions
of temperature
(c) Raleigh has more variability
(d) Raleigh
highest: 78 lowest: 39 diff=39
(e) San Francisco
highest: 65 lowest: 49 diff=16
(f) The data below the median are 39, 42, 43, 50, 51, 59. The median
of these and therefore the lower quartile of the distribution is 46.5.
39 42 43 | 50 51 59
median=46.5
(g)The data below the median are 60, 67, 71, 74, 77, 78. The median
of these and therefore the upper quartile of the distribution is 72.5.
60 67 71 | 74 77 78
median=72.5
(h) Below 46.5 are three of the twelve values (25%). Above 72.5 are
three of the twelve values (25%). Between the two are six of the twelve
values (50%).
(i) IQR = upper quartile - lower quartile = 72.5 - 46.5 = 26
(j) Raleigh has a greater interquartile range, indicating more variability
in the temps.
(k)
minimum
lower quartile
median
upper quartile
maximum
Raleigh
39
46.5
59.5
72.5
78
San Francisco
49
52.5
57
62.5
65
(l)
(m)
(n)
Activity 5-2: City Temperatures (cont.)
(a) March: –9.25 and July 18.75
(b) They should always sum to zero (exact if enough decimal places)
(c) March: 9.25; July: 18.75; sum = 143
(d) 11.92
(e) March: 85.56 and July: 351.56, sum = 2208.22
(f) 2208.22/(12-1)=200.75 (degrees squared)
(g) sqrt(200.75) = 14.17 degrees
(h) std (Raleigh) = 14.17 degrees
std (SF) =5.75 degrees
The standard deviation for Raleigh's temperatures
is indeed larger.
Activity 5-3: Interpreting Spread and Boxplots
(a) A: upscale (smallest spread); B: sports (largest spread); C: small
cars
(b) Boxplot (i) displays textbook prices since their distribution is
skewed left (min is much smaller than lower quartile).
Boxplot (ii) dispalys the Senators’
years of service (skewed right; max is much higher than upper quartile).
Boxplot (iii) pertains to the % urban
(roughly symmetric; similar distance between lower quartile and median
as upper quartile and median).
Activity 5-4: Supreme Court Justices (cont.)
(a)-(c)
std. dev.
IQR
range
Justices
7.89
10
22
Justices with "big" outlier
13.21
10
42
Justices with "huge" outlier
72
10
222
(d) Only the interquartile range is resistant because it is not affected
by these outliers. The others are strongly affected by outliers.
Activity 5-5: Placement Exam Scores (cont.)
(a) The distribution of placement exam scores does appear to be roughly
symmetric and mound-shaped.
(b) The upper endpoint is 10.221 + 3.859 = 14.080. The lower
endpoint is 10.221 – 3.859 = 6.362.
(c) The scores in between these endpoints are 7, 8, 9, 10, 11, 12,
13, and 14. There are 16+15+17+32+17+21+12+16 = 146 of them. This
proportion is 146/213 = .685.
(d) There are 202 of the 213 scores within two standard deviations
of the mean. This proportion is .948.
(e) All 213 scores, a proportion of 1.0, fall within three standard
deviations of the mean.
Activity 5-6: SATs and ACTs
(a) Bobby scored 184 points above the SAT mean.
(b) Kathy scored 7.4 points above the ACT mean.
(c) Such a conclusion is not sensible because the scales on the two
exams differ so greatly, with the SAT involving much higher numbers than
the ACT.
(d) Bobby scored (1080-896)/174 = 1.06 standard deviations above the
SAT mean.
(e) Kathy scored (28-20.6)/5.2 = 1.42 standard deviations above the
ACT mean.
(f) Kathy has the higher z-score.
(g) Kathy performed better relative to the peers who took the same
exam.
(h) Peter's z-score is (740-896)/174 = -0.90. Kelly's z-score is (19-20.6)/5.2
= -0.31.
(i) Kelly's z-score is higher (less negative).
(j) A z-score is negative when the observation's value is below the
mean.
Activity 5-7: Value of Statistics (cont.)
(a)-(b) Predictions will vary from student to student.
(c)
class F
class G
class H
class I
class J
range
6
8
8
8
8
interquartile range
2.5
2
0
8
4
standard deviation
1.8
2.0
1.2
4
2.7
(d) This could be argued either way. G has a higher range and standard
deviation, but F has a larger IQR. NOTE: Fathom's IQR (and
quartiles) differ from the text's. (e) Class I's scores have the most variability, while class H's have
the least.
(f) Class F's scores have more bumpiness but not more variability than
class G's.
(g) Class J has the highest number of distinct values but not the most
variability.
(h) Neither "bumpiness" nor "variety" is directly related to "variability."
This exercise shows that a bumpier distribution can have less variability
than a less bumpy one and that a distribution with more variety of values
can have less variability than one with less variety of values. Variability
instead is a measure of how far apart the observations are from each other,
i.e., their spread.