Workshop Statistics: Discovery with Data and Fathom

Topic 2:Data, Variables, and Fathom

Activity 2-1: Scrabble Names

(b)

Most of the names were 6 letters. There were a few names that were noticeably longer, Blackwell with 9 letters and Nightingale with 11 letters.

(c)

Most of the points were between 7 and 12, with no real peak. There are two noticeable outliers, Nightingale with 16 points and Blackwell with 20 points.

(d) Most letters: Nightingale; Most points: Blackwell, not the same person
(e) Fewest letters: Tukey with 5; fewest points: Gosset and Galton with 7 points; not the same person.
(f)

(g)

The ratio values are much more evenly "spread out", ranging from 1 to 2 points/letter.

(h) The highest ratio at 2.4 belongs to Tukey. He didn’t have very many letters but some of them were pretty valuable. People like Nightingale have lots of points but that’s not so surprising considering the number of letters.

Activity 2-2: Gender of Physicians

(b) Most: 1. internal medicine, 32576; 2. pediatrics, 25633; 3. family practice, 16416
Fewest: 39. thoracic surgery, 13; 38. aerospace medicine, 39; 37. colon/rectal surgery, 59
(c) The number of physicians in each field. Sure there are a lot of females internal medicine specialists, but there are a lot of male internal medicine specialists as well!
(d) Largest: 1. pediatrics, 46.25; 2. medical genetics, 41.2; 3. child psychiatry, 39.06
Smallest: 39. urological, 2.63; 38. orthopedic, 3.3; 37. neurological surgery, 4.42
(e) The answers don’t agree exactly. There are some specialties that have a fewer number of doctors so there are a fewer number of women. However the ratio of women can still be high. For example, there are 32, 476 females in internal medicine, but they also have the largest number of physicians overall. There are only 103 women in medical genetics, but that’s a large fraction of the 250 specialists in that field.
(f)

Somewhere between 1000 and 2000 seems to be a fairly typical number of women, e.g. physical medicine and rehab.

(g)

A little under 25% women seems typical, e.g. pediatric cardiology (24.2%)

(h) Emergency medicine has 3662 women but are only 17% of the overall number in emergency medicine. Can you find a more extreme example?
(i) Medical genetics have a small number of women (103) but make up 41% of all specialists in this area. Again, there are numerous combinations.
(j) Describing the dotplot: The distribution is fairly symmetric, centered roughly between 20 and 25%.. The highest percentage is pediatrics with 46% women.
(k) It can be very difficult to compare "counts". Since the number of physicians varies so much from specialty to specialty, the number of women can be misleading if we want to know more about the gender breakdown between the specialties, e.g. which specialities have "a lot of women" or which specialties are women more likely to choose?

Activity 2-3: Fan Cost Index

(a) highest: N.Y. Yankees, $166.82; lowest: Montreal, $87.87
(b)

Most dots are between $90 and $140, with a slight majority of these dots falling between $120 and $140. There are a few outliers, mostly on the higher end.

(c) - (e) Answers will vary from student to student.
(f) highest: N.Y. Mets, $3.50; lowest: Philadelphia, $1.25
(g) The term "small" is relative to the ballpark. The size of a "small" soda varies from ballpark to ballpark.
(h) highest: Boston, $0.18 per oz.; lowest: Montreal, $0.08 per oz.

Activity 2-4: State's SAT Averages

(a) Highest average SAT: Iowa; Lowest average SAT: South Carolina
Iowa only had 5% take the exam, S. Carolina had 61%
(c)

The number of states with only "few" students taking the SAT is only very slightly lower than the number of states where more than 25% of the students take the exam.

(d)

The SAT averages for states that had more than 25% of high school seniors take the exam appears reasonably mound shaped, centered around 1000. The SAT averages for states that had less than 25% of high school seniors take the exam also looks reasonably mound shaped, but centered much higher, around 1120. In fact, there is not much overlap between the two distributions.
One explanation could be that in states where not a high proportion of students take the SATs, those students that do may tend to be the college bound students and may not represent the performance for all students in the state (especially states that tend to use the ACT scores instead). When a higher proportion of students take the SATs, the average will describe a much more diverse population, lowering the overall averages for those states.

(e) A high average may not be a good indicator of how well the state prepares students for the exam (and college). As more students take the exam, it’s likely this will reduce the overall average. So, a high average for a state may simply result from a low percentage of people taking the exam.
(f) Answers will vary from student to student.