Workshop Statistics: Discovery with Data, Second
Edition
Topic 27: Inference for Correlation and Regression
Activity 27-1: Baseball Payrolls
(a) There appears to be a moderate positive association between payroll
and winning percentage.
(b)
These teams seem to have large payrolls and larger winning percentages.
(c) Positive relationship with moderate strength. Student
guesses will vary.
Actual r=.630
(d) Answers will vary. These are intended to
be sample answers.
Below is one possible random assignment.
payroll |
win pct |
70.3710 |
0.598765 |
75.0650 |
0.444444 |
55.3685 |
0.475309 |
42.1428 |
0.588957 |
54.3925 |
0.395062 |
15.1500 |
0.635802 |
55.5640 |
0.456790 |
71.1358 |
0.413580 |
42.9274 |
0.419753 |
16.3630 |
0.530864 |
71.3314 |
0.617284 |
30.5165 |
0.595092 |
24.2177 |
0.484472 |
46.2482 |
0.475309 |
45.9322 |
0.459627 |
46.0096 |
0.465839 |
The correlation coefficient for these two columns is r=-.281
(e) This correlation is not nearly as large as the one observed in
the sample (.630).
(f)
(g) The graph is fairly symmetric, centered around zero.
(h) None is close to the correlation we observed in the sample and
thus the sample correlation (.705) is very unlikely to happen by chance
alone, indicating that the relationship between winning percentage and
payroll is statistically significant.
(i) t = .630 sqrt(16-1)/sqrt(1-.6302) = 3.03 with 16-2=14
degrees of freedom.
Table III indicates that .001 < one-sided p-value <
.005
With such a small p-value, we have strong evidence against the null
hypothesis of no association, suggesting that there is an association between
payroll and winning percentage.
Activity 27-2: Studying and Grades
(a) observational units: students surveyed
variable 1: hours of study (quantitative)
variable 2: GPA (quantitative)
(b)
There appears to be a fairly weak positive association between hours
studied per week and gpa.
(c) Predicted gpa= 2.89 + .0894 hours/week
correlation coefficient, r=.343
(d) the slope=.0894 indicates the change in gpa for an increase of
one study hours/week.
(e) no, sampling variability
(f) correlation and regression slope are both essentially zero.
Answers will vary, these are intended as sample
answers.
Note: the answers to (g)-(s) below are the answers
to (h)-(t) in the Minitab version.
(g) bs
-0.0245873 0.0126319
-0.0007703 0.0307915 -0.0238198 -0.0216128
-0.0191908 0.0065811
-0.0245659 -0.0283816 0.0386437 -0.0215482
-0.0247520 0.0427848
0.0099658 0.0094685 0.0041925 -0.0075949
-0.0114600 -0.0241534
(h)
(i) .0894 is nowhere near any of these values. This indicates that
the sample result of .0894 is extremely unlikely to occur by chance alone.
(j) mean close to zero, standard deviation: .0225
(k) For sample data, the standard error of the slope coefficients is:.02771.
This is close to .0225.
(l) t=.0894/.02771 = 3.23 with df=80-2=78.
(m) Using df=80, we'd get a .0005< p-value<.001
(n) This small p-value is consistent with the simulation results which
never had a sample slope above .045.
(o)Yes
(p) t*(n-2)=1.990
b + 1.990 (.0225) = (.045,.134)
(q) We are 95% confident that the increase in GPA for an additional
hour of study time is between .045 and .134
(r)
The sample line appears quite different from the simulated lines.
(s) Thus a slope this extreme is not very likely to occur if there
is no relationship between the two variables in the population. This indicates
that we have strong evidence of an association between hours of study and
gpa.
Activity 27-3: Studying and Grades (cont.)
(a)-(b)
The residuals plot look reasonably symmetric and normal. I would consider
the normality condition met.
The residuals vs. hrs/week does not seem to show a strong nonlinear
pattern. The variability in the residuals is also relative constant
(expect maybe at the high hrs/week end, but we don't have many observations
there.