Exam 1 Review Problems

1) Suppose that the observational units in a study are the patients arriving at an emergency room in a given day. For each of the following, indicate whether it can legitimately be considered a variable or not. If it is a variable, classify it as categorical (and if it is binary) or quantitative. If it is not a variable, explain why not.

a. Blood type

b. Waiting time

c. Mode of arrival (ambulance, personal car, on foot, other)

d. Whether or not men have to wait longer than women

e. Number of patients who arrive before noon

f. Whether or not the patient is insured

g. Number of stitches required

h. Whether or not stitches are required

i. Which patients require stitches

j. Number of patients who are insured

k. Assigned room number

2) When a tennis racquet is spun, is it equally likely to land with its label facing up or down? (This technique is often used to decide who should serve first.) Or does the spinning process favor one outcome more than the other? A statistics professor once investigated this question by spinning his tennis racquet many times. For each spin he recorded whether the racquet landed with the label up or down.

(a) Describe (in words) the relevant parameter whose value is being investigated with this study.

(b) Write the appropriate null and alternative hypotheses (in symbols).

He spun his racquet 100 times, finding that it landed with the label up in 46 of those spins.

(c) Would you consider these 100 spins to be a sample from a random process or a random sample from a population? Explain briefly.

(d) Describe how you could use a coin to conduct a simulation analysis of whether this result constitutes strong evidence that his racquet spinning process is not equally likely to land with its label facing up or down. Provide enough detail that someone else could implement the simulation and draw the appropriate conclusion.

(e) Use technology to simulate 1,000 repetitions of 100 spins each. Use the simulation result to produce an approximate p-value. Be very clear how you are carrying out this simulation and how you are finding the approximate p-value.

(f) Use the binomial distribution to calculate the p-value exactly. (Be sure to indicate how you calculate this probability: what values you use for n and , and what region you find the probability of.)

(g) Check whether the normal approximation (Central Limit Theorem) is valid here.

(h) Describe what the CLT says about the (approximate) sampling distribution of the sample proportion , assuming that the null hypothesis is true. Be sure to describe each of shape, mean, and standard deviation, and to include a rough sketch (but well labeled) of the distribution.

(i) Calculate and interpret the test statistic by finding the z-score for the observed sample proportion .

(j) Determine the (approximate) p-value from the standard normal distribution.

(k) What test decision would you make at the .05 significance level?

(l) Do the conditions for the (Wald) normal-based confidence interval hold here?

(m) Produce and interpret a 95% confidence interval for the parameter, using the Wald procedure if the conditions are met but using the Adjusted Wald procedure if they are not met.

(n) Is the confidence interval consistent with the test decision? Explain.

(o) Summarize your conclusion about the original question that motivated this study (be sure to comment on significance, confidence, and generalizability).

(p) Summarize how your calculations and conclusions would change if you instead examined the 54 spins that landed label down.

(q) Use the binomial distribution to determine the rejection region (in terms of number of “up” results in the sample) for the .05 significance level.

(r) Use the binomial distribution to determine the power of this test, using the .05 significance level, when the actual probability of this spun tennis racquet landing “up¨ is .65.

(s) Use the normal approximation to determine the rejection region (in terms of the sample proportion ) for the .05 significance level.

(t) Use the normal approximation to determine the power of this test, using the .05 significance level, when the actual probability of this spun tennis racquet landing “up¨ is .65.

(u) How would the power in (t) change if you changed the level of significance to .01? You should explain/justify without performing any additional calculations.

(v) Use the normal approximation to determine how large the sample size n needs to be in order for the 95% confidence interval to have margin-of-error < .08.

3) Findings at James Madison University indicate that 21% of students eat breakfast 6 or 7 times a week. A similar question was asked of a random sample of 159 Cal Poly students. Of the 97 who responded, 35 reported eating breakfast 6 or 7 times a week. Is this convincing evidence that Cal Poly students have healthier breakfast habits (i.e., more likely to eat breakfast) than James Madison students? More specifically, are you convinced that more than 21% of all Cal Poly students eat breakfast 6 or 7 times weekly?

(a) Define the population of interest and the sample being considered.

(b) Define the parameter and the statistic for this study.

(d) What conclusion would you draw from this p-value?

(e) Provide an interpretation of this p-value as if to someone not taking a statistics class.

(f) If you took another random sample of 159 Cal Poly students, which of your answers to part b would change?

(g) What are your thoughts about the fact that only 97 out of the original random sample of 159 responded?

(h) Suppose you plan to conduct a new study with a simple random sample of 1,590 Cal Poly students. Explain how you could obtain this sample.

(i) Would this new sample size address the issue you identified in part g?

(j) How would you expect this p-value in part c to change if of the 1,590 Cal Poly students you sample 36% reported eating breakfast 6 or 7 times a week (larger, smaller, or about the same)? Explain (without finding a new p-value!).

4) A plastics manufacturer will change the warranty on his plastic trash cans if the data from 20 cans strongly suggests that fewer than 90% of such cans would survive the 6-year warranty period.

(a) How many cans would need to survive to convince you to reject the null hypothesis?

(b) What is the power against the alternative value of 80%?