Workshop Statistics: Discovery with Data, Second
Edition
Topic 11: Least Squares Regression II
Activity 11-1: Gestation and Longevity
(a) gestation = 21.7 + 13.1 * longevity
(b) For each addition year of the animal's longevity, its gestation
period is longer by 13.1 days.
(c) 44%
(d)
It seems as though the predictions
are generally closer when the longevity is very small.
(e) The elephant (residual = 98.23) is an outlier in both longeviy
and gestation. There are 6 other animals with larger positive value
residuals, and 6 other animals with larger negative value residuals.
So no, the elephant, while being extreme in longevity and gestation, does
not have the largest residual.
(f) The giraffe's gestation period is much longer than expected for
an animal with its longevity (residual = 272)
(g) regression line: gestation = 9 + 13.6 * longevity; r2
= .501
(h) no
(i) regression line: gestation = 45 + 11.1 * longevity; r2
= .269
(j) elephant
(k) regression line: gestation = 110 + 5.26 * longevity; r2
= .092
Thus, the regression line is not resistant to outliers, especially
points that are extreme in the horizontal direction.
Activity 11-2: Residual Plots
(a)
(b)
-
The residuals are randomly scattered: 1, d
-
The residuals are largely randomly scattered except for two very large
negative residuals: 4, c
-
The residuals show a distinct curved pattern: 2, b
-
The residuals show a clear linear pattern with three severe outliers: 3,
a
(c) Plots 1 and 4 summarize the relationship in the data about as well
as possible. The points fall roughly evenly about the least squares
regression line. Plots 2 and 3 would best be described by some type
of curve.
(d) The scatterplots where the lines summarize the data about as well
as possible do not correspond to the highest values of r2.
More points fall closer to the line in the other two plots, although they
aren't best modeled by a linear fit.
Activity 11-3: Televisions and Life Expectancy (cont.)
(a)
This relationship does not
appear to be linear, but rather curved.
(c) life exp = 80.6 - 13.3 * log(per TV)
(d) .850
(e) 67.3
(f) 54; difference = 13.3, the slope coefficient
(g)
This scatterplot reveals
no clear pattern.
(h) The linear regression model is a better fit with the transformed
data than with the original data.