Workshop Statistics: Discovery with Data, Second
Edition
Topic 3: Displaying and Describing Distributions
Activity 3-1: Features of Distributions
(a) center of distribution
(b) variability or spread of distribution
(c) shape of distribution
(d) 2 distinct clusters of scores
(e) outliers (one low, one high) that aren't with the rest of the data
(f) granularity (data at fixed intervals), here data occur in multiples
of 5
Activity 3-2: Matching Variables to Dotplots
Note: There is a typo in the text, there should be
a different dotplot for (7). This plot is fairly spread out with a slight
skewness to the left.
(a) - (3) because the values contain no repeats and are fairly evenly
spread out
(b) - (6) because some cities have zero snowfall, and there is a lot
of variation among the others
(c) - (5) because the values are at regular increments, and there are
mostly ones and twos with a gradual drop-off beyond that
(d) - (4) because the increments are fairly regular and there are many
prices with two properties at the same price
(e) - (2) because there are many different values (large variability)
and more repeated values than dotplot 7 (weights reported to the integer)
(f) - (1) because the values are slightly skewed to the right, but
with a concentration at the lower end. The skewness to the right
makes sense, expect a few mothers to be a fair bit older than average,
but not as many to be much younger than average.
(g) - (7) because there are a wide range of values with slight skewness
to the left. Makes sense that might have a few cars that a very light,
but not as many cars that are extremely heavy. Less granularity than dotplot
2 (larger number of distinct values).
(h) - (8) because the scores are skewed left, most of the students
were at the high end, perhaps scores in the 80’s and 90’s, with a few students
lower in the distribution, not scoring well on this exam
Activity 3-3: Rowers' Weights
(a) Four rowers weigh 195 pounds. This value has the tallest stack of dots
on the dotplot.
(b) The shape of the distribution is skewed to the left, with the center
around 195 pounds. The spread is from 120 to 230 pounds, with 2 clusters
and an outlier near 120 pounds.
(c) There is one cluster around 150-160. If we look at the events of
those rowers, there is an LW designation each time. These rowers
participant in "lighweight" events which require them to weight below a
certain amount (165) on race day. The upper cluster are not in lightweight
events and there is no upper bound for how much they can weigh.
(d) The apparent outlier is Segaloff, the coxswain, who calls out instructions
but does not row and is therefore light so as to add little extra weight
to the boat.
Activity 3-4: British Rulers' Reigns
(a) 63 years, Victoria
(b) 0 years, Edward V. This ruler must have ruled for less than
6 months, which was rounded down to 0 years.
(c)
0| 9026536791
1| 3907332305
2| 10224255
3| 555983
4| 4
5| 609
6| 3
(d)
0| 0123566799
1| 0023333579
2| 01222455
3| 355589
4| 4
5| 069
6| 3
(e) The distribution of lengths of reign of British rulers ranges from
0 to 63 years, and is skewed to the right. A large cluster of rulers
reigned under 40 years. There is a small cluster of reigns 50-some
year reigns, and no major outliers.
Activity 3-5: Geyser Eruptions
(a) 7
(b) 12+23+54+53+16+6=164; 164/222, which is .739
(c) No, since 90 is not a starting point of a histogram's interval.
(d) There seem to be two clusters, one in the 50's and one in the high
70's and low 80's (minutes). Turns out the duration between eruptions depends
on whether the previous eruption was long or short.
(e) The different subinterval widths change the histogram's appearance
dramatically. With 5 subintervals the two clusters are not apparent, and
with 20 subintervals the distribution looks very jagged. The most informative
picture is probably the histogram with 10 subintervals.
(e)
and (f) for the Calculator version
(e)
and (f) for the Minitab version