Workshop Statistics: Discovery with Data and Fathom

Topic 5: Measures of Spread

Activity 5-1: City Temperatures

(a) Raleigh median: 59.5 degrees
    SF median: 57 degrees
    Pretty close to each other
(b) No, also need to consider the variability in the distributions of temperature
(c) Raleigh has more variability
(d) Raleigh
    highest: 78 lowest: 39 diff=39
(e) San Francisco
    highest: 65 lowest: 49 diff=16
(f) The data below the median are 39, 42, 43, 50, 51, 59. The median of these and therefore the lower quartile of the distribution is 46.5.
     39 42 43 | 50 51 59
    median=46.5
(g)The data below the median are 60, 67, 71, 74, 77, 78. The median of these and therefore the upper quartile of the distribution is 72.5.
    60 67 71 | 74 77 78
    median=72.5
(h) Below 46.5 are three of the twelve values (25%). Above 72.5 are three of the twelve values (25%). Between the two are six of the twelve values (50%).
(i) IQR = upper quartile - lower quartile = 72.5 - 46.5 = 26
(j) Raleigh has a greater interquartile range, indicating more variability in the temps.
(k)

	minimum	lower quartile	median	upper quartile	maximum
Raleigh	39	46.5	59.5	72.5	78
San Francisco	49	52.5	57	62.5	65

(l)

(m)

(n)

Activity 5-2: City Temperatures (cont.)

(a) March: –9.25 and July 18.75
(b) They should always sum to zero (exact if enough decimal places)
(c) March: 9.25; July: 18.75; sum = 143
(d) 11.92
(e) March: 85.56 and July: 351.56, sum = 2208.22
(f) 2208.22/(12-1)=200.75 (degrees squared)
(g) sqrt(200.75) = 14.17 degrees
(h) std (Raleigh) = 14.17 degrees
std (SF) =5.75 degrees
The standard deviation for Raleigh's temperatures is indeed larger.

Activity 5-3: Interpreting Spread and Boxplots

(a) A: upscale (smallest spread); B: sports (largest spread); C: small cars
(b) Boxplot (i) displays textbook prices since their distribution is skewed left (min is much smaller than lower quartile).
Boxplot (ii) dispalys the Senators’ years of service (skewed right; max is much higher than upper quartile).
Boxplot (iii) pertains to the % urban (roughly symmetric; similar distance between lower quartile and median as upper quartile and median).

Activity 5-4: Supreme Court Justices (cont.)

(a)-(c)

	std. dev.	IQR	range
Justices	7.89	10	22
Justices with "big" outlier	13.21	10	42
Justices with "huge" outlier	72	10	222

(d) Only the interquartile range is resistant because it is not affected by these outliers. The others are strongly affected by outliers.

Activity 5-5: Placement Exam Scores (cont.)

(a) The distribution of placement exam scores does appear to be roughly symmetric and mound-shaped.
(b) The upper endpoint is 10.221 + 3.859 = 14.080. The lower endpoint is 10.221 – 3.859 = 6.362.
(c) The scores in between these endpoints are 7, 8, 9, 10, 11, 12, 13, and 14. There are 16+15+17+32+17+21+12+16 = 146 of them. This proportion is 146/213 = .685.
(d) There are 202 of the 213 scores within two standard deviations of the mean. This proportion is .948.
(e) All 213 scores, a proportion of 1.0, fall within three standard deviations of the mean.

Activity 5-6: SATs and ACTs

(a) Bobby scored 184 points above the SAT mean.
(b) Kathy scored 7.4 points above the ACT mean.
(c) Such a conclusion is not sensible because the scales on the two exams differ so greatly, with the SAT involving much higher numbers than the ACT.
(d) Bobby scored (1080-896)/174 = 1.06 standard deviations above the SAT mean.
(e) Kathy scored (28-20.6)/5.2 = 1.42 standard deviations above the ACT mean.
(f) Kathy has the higher z-score.
(g) Kathy performed better relative to the peers who took the same exam.
(h) Peter's z-score is (740-896)/174 = -0.90. Kelly's z-score is (19-20.6)/5.2 = -0.31.
(i) Kelly's z-score is higher (less negative).
(j) A z-score is negative when the observation's value is below the mean.

Activity 5-7: Value of Statistics (cont.)

(a)-(b) Predictions will vary from student to student.
(c)

	class F	class G	class H	class I	class J
range	6	8	8	8	8
interquartile range	2.5	2	0	8	4
standard deviation	1.8	2.0	1.2	4	2.7

(d) This could be argued either way. G has a higher range and standard deviation, but F has a larger IQR. NOTE: Fathom's IQR (and quartiles) differ from the text's.
(e) Class I's scores have the most variability, while class H's have the least.
(f) Class F's scores have more bumpiness but not more variability than class G's.
(g) Class J has the highest number of distinct values but not the most variability.
(h) Neither "bumpiness" nor "variety" is directly related to "variability." This exercise shows that a bumpier distribution can have less variability than a less bumpy one and that a distribution with more variety of values can have less variability than one with less variety of values. Variability instead is a measure of how far apart the observations are from each other, i.e., their spread.