Workshop Statistics: Discovery with Data, Second Edition

Topic 14: Probability

Activity 14-1: Random Babies

Students' answers to (a)-(i) may differ since the data is chosen randomly.  These are meant to be sample answers.
(a) (b)
 
repetition #
1
2
3
4
5
Johnson match?
no
no
no
no
no
All wrong?
yes
no
no
no
no
# of matches
0
1
1
1
1

(c)
 
Student
Cum reps
Johnson
matches
?
All
wrong
?
   
Of these 5
Cum tot
Cum prop
Of these 5
Cum tot
Cum prop
1
5
0
0
.000
2
2
.400
2
10
0
0
.000
1
3
.300
3
15
1
1
.067
2
5
.333
4
20
2
3
.150
2
7
.350
5
25
0
3
.120
3
10
.400
6
30
1
4
.133
2
12
.400
7
35
1
5
.143
3
15
.429
8
40
0
5
.125
1
16
.400
9
45
1
6
.133
3
19
.422
10
50
2
8
.160
1
20
.400
11
55
1
9
.164
2
22
.400
12
60
1
10
.167
3
25
.417
13
65
2
12
.185
1
26
.400
14
70
0
12
.171
2
28
.400
15
75
4
16
.213
1
29
.387
16
80
0
16
.200
3
32
.400
17
85
2
18
.212
1
33
.388
18
90
2
20
.222
1
34
.378
19
95
2
22
.232
1
35
.368
20
100
4
26
.260
1
36
.360

(d)
(e) The proportion of trials where Johnson matches appears to be approaching .25.  The proportion with no matches appears to be aproaching .35.
(f)
 
# of matches
0
1
2
3
4
Total
Count
35
37
22
0
6
100
Proportion
.35
.37
.22
.00
.06
1.0

(g) .65
(h) empirical probability of no matches =.35
(i) empirical probability of at least one match=.65
(j) An outcome of exactly three matches is impossible because if 3 are correct, then the last position must be correct since there are only 4 possibilities to begin with.
(k) It is not impossible to get four matches, but it is rare.
(l) A result of 0, 1, or 2 matches is not unlikely.
 

Activity 14-2: Random Babies (cont.)

(a) 24
(b)
 
1234:  4 1243:  2 1324:  2 1342:  1 1423:  1 1432:  2
2134:  2 2143:  0 2314:  1 2341:  0 2413:  0 2431:  1
3124:  1 3142:  0 3214:  2 3241:  1 3412:  0 3421:  0
4123:  0 4132:  1 4213:  1 4231:  2 4312:  0 4321:  0

(c) 4: 1    3: 0    2: 6    1: 8    0: 9
(d) 4: .042    3: 0    2: .25    1: .333    0: .375
(e) the graph that represents 10,000 times
Students' answers to (f)-(g) may differ.
(f) 1.05
(g) 4(.042) + 3(0) + 2(.25) + 1(.333) + 0(.375) = 1
This is reasonably close to the average number of matches from the simulated data.

Activity 14-3: Weighted Coins

(a)
 
repetition
1st coin
2nd coin
3rd coin
4th coin
5th coin
6th coin
1
H
H
T
H
H
T
2
H
H
T
H
H
H
3
T
H
H
H
H
T
4
H
H
T
T
H
T
5
H
H
T
H
H
T
relative frequency
.8
1.0
.2
.8
1.0
.2

(b)-(e) Answers will vary from student to student.
(f) Probability describes the long-term behavior of random processes and not the short-term behavior because many trials are needed before an accurate probablity can be deduced.  For instance, this graph shows a lot of variation in relative frequencies near 0 repetitions.  However, as the six coins are flipped more and more often, the relative frequencies stabilize and show us the long-term behavior we are interested in.
 

Activity 14-4: Boy and Girl Births

Students' answers to (a)-(d) may differ since the data is chosen randomly.  These are meant to be sample answers.
(a)
Family 1
 
Child number
1
2
3
4
# girls
Random digit
1
5
5
2
1
Gender
boy
boy
boy
girl
 

(b)
 
Family #
1
2
3
4
5
# girls
1
1
3
4
2

(c)
 
# of girls in family
0
1
2
3
4
# of simulated families
8
24
35
26
7
Empirical probability
.08
.24
.35
.26
.07

(d) They were very close.
(e) The probability of having 2 children of the same gender is .375. The probability of having 3 children of the same gender is .25+.25=.50 since we could have 3 girls and 1 boy or 3 boys and 1 girl. Having two different ways to accomplish "3 children of the same gender" results in a higher probability.
(f) Answers will vary from student to student.
(g) 4-children families: .381;  10-children families: .257
The probability of an exact 50-50 split decreases as the probabilities spread throughout the possible values.
Note: The above statement is the answer to (h) in the Minitab version.

Activity 14-5: Hospital Births

(a) The proportion of days that hospital A observed an equal count of girls and boys is about .247 (90/365).  The proportion of days that hospital B observed an equal count of girls and boys is about .126 (46/365).  The smaller hospital has more days with an exact 50/50 split.
(b) hospital A
(c) hospital B
(d) While a larger sample size makes it less likely to get an exact 50/50 split in the observed counts, the probability of getting a sample proportion close to 1/2 increases with a larger sample.