Stat 320 -- Mini-Project 1

Stat 320 -- Mini-Project 1

Designing and analyzing a study

Goal: To collect, describe, and analyze using the methods of Chapter 1.

Teams: You are to work in teams of 2-3 people. It is up to the members of the group to make sure everyone contributes equally. Plan your schedules so that you will have time to work together on the project outside of class. Teams should be formed and project topics selected by April 4. You may be asked to share your proposals with the rest of the class. You are also encouraged to share your ideas with me before you begin collecting any data. Please start early so you have time to ask questions. You should have your data collected by April 11. You may have time to work on the data analysis in class April 12.

The Study: You are free to choose your own topic. You should think of two groups that you can compare with one categorical variable through an experiment or an observational study. Make sure you choose a topic for which it is feasible to gather the data in a relative short period of time. The question may be related to your major or some other topic of interest. For example, you could observe men and women on campus to determine whether they are left or right handed, or you could randomly assign people to take a survey with two different wordings and see if they respond differently depending on how the question is asked. Your study must obtain at least 10 observational/experimental units in each explanatory variable group.

Final Report: Due April 13. This should be a typed report, written collaboratively by all team members. Your report should be written as to other student researchers. Make sure it includes at least:

I. Introduction – Why did you choose this topic? What did you expect to find? Have similar studies been done elsewhere? Why should the reader be interested in your results and continue reading?

II. Summary of Data Collection Methods – How did you collect the data? What were the experimental/observational units? What groups did you compare, how did you find them/form them? Was this a prospective or retrospective study? Observational or Experiment? What was your response variable? How were these variables measured? What additional “controls” did you exert on the study? (E.g., did you only observe people writing or did you take any behavior such as throwing a football as indication of handedness?) Any “operational definitions”? (E.g., did you pre-test any of the questions on a test group to see if the wording was clear?) Did you have any problems with non-response or other unexpected results? Did anything go wrong during the course of the study? (Note: You can never give me too much detail in this section!) In particular, there should be enough information that someone else could replicate your study on their own based only on your description (and hopefully improve upon it based on your suggestions below).

III. Analysis of Results – Include appropriate numerical and graphical summaries of your data, including the two-way table. Write several paragraphs explaining what you found in these data. Use both simulation (using the Java applet) and Fisher’s Exact Test (using Minitab) to analyze your results, reporting both the approximate and exact p-value (and include the output – you can make a screen capture of the applet window using the Prnt Scrn key on the keyboard). Include a careful interpretation of what this p-value tells you. Is the difference between the groups statistically significant? What conclusions can you draw? Be sure to refer back to the type of study conducted in explaining the scope of your conclusions. Address both the question of causation and the question of whether you believe your findings generalize to a larger population. (Note: All computer output should be included in the body of the report. Make sure all figures and graphs are clearly labeled.)

IV. Conclusion – Summarize the results of your study. What did you learn? Did the data behave as you expected? Critique the methods used to collect the data. Is there anything you would do differently next time? How might this affect the conclusions of the study? What similar questions might someone chose to investigate in the future to build on your results?

Previous Project Ideas:

Compare people arriving to a location with elevators and stairs to see if one gender uses either mode of transportation more often.

Survey men and women to see if one gender tends to bicycle to campus more often

Do freshmen call their parents more often then upper classmen?

Are people more likely to agree if you randomly decide to ask them “Are you happy with your roommate” or to disagree if you ask them “Are you unhappy with your roommate?”

Are people more likely to support the war in Iraq if you phrase the question differently?

Are freshmen more likely to think they will change their major at Cal Poly than transfer students?

Does listening to classical music while studying a group of words improve whether subjects can pass a threshold level of recall?

Do self-identified chocolate lovers prefer the taste of Ghiradelli chocolate to Nestle’s?

Are people more likely to loan you money for a phone call depending on how you are dressed?

Are women more likely to respond if you sneeze near them?

Does class time have any effect on caffeine consumption and in what form do students get their caffeine, soda or coffee?

Tell people they are tasting two different types of muffins and see if they are more likely to predict the second taste of the same muffin.

Are people using their cell phone less likely to come to a complete stop at a four way intersection?

Mini-Project 2 will apply methods from Ch. 3 but you might want to begin thinking of topics now. Project 2 will involve taking a random sample (in the true sense of the word) from a well-defined population and measuring a categorical variable. Before you collect the data you should make a conjecture as to the value of the population proportion.

(e.g., more than 60% of Cal Poly students own a cell phone, more than 1/3 of TV commercials during the NBA playoffs last longer than 30 seconds, less than 75% of San Luis drivers come to a complete stop at intersections next to campus, a majority of college students can distinguish between the taste of Coke and Pepsi, more than half of a random sample of products are more expensive at Scolari’s than at Lucky’s).

The sample size should be at least 30 and the population should be at least 20 times the size of your sample. The type of study can be an experiment, an observational study, or a survey. The key requirement will be that you randomly select the observational units from the larger population. (Note: the sample does not have to consist of humans. You should be very careful in how you define your population.) You are free to choose your own topic(s). The topic may be related to your major or another topic of interest. Make sure you choose a topic so that it is straightforward to gather the data or you have access to data from another class or professor. You may work with up to two other people.

Final Report: Due May 10. This should be a typed report, written collaboratively by all team members. Your report should be written as if will be read by other student researchers. Make sure it includes at least:

I. Introduction

Same guidelines as last time. You should describe the population parameter of interest, an initial conjecture for its value (that makes sense in the context) and whether you suspect the actual value is higher or lower (of just different) than this conjectured value.

II. Data Collection Methods

Same rules as last time, remember to tell me everything, good and bad. Think about designing a study protocol where someone else could mimic exactly the same study that you carried out. In your discussion, be sure to define your observational units, variable of interest, population of interest, sampling frame (if applicable), and parameter of interest. Which type of probability sampling method did you use (SRS, stratified, cluster, systematic)? If you designed a survey, are there any potential wording issues? Did you “field-test” the questions first? How did you ensure confidentiality or take other precautions to ensure honest responses? What was the response rate? How often did you have to make repeat visits in order to obtain the observational units initially selected? Are there any other potential sources of sampling or non-sampling errors?

III. Analysis of Results

Descriptive Statistics

You will need to make choices as to which numerical and graphical summaries are most relevant. Make sure you integrate the output into the body of the report and include discussions of how you are interpreting the message in these summaries. In your discussion you should fully describe your sample, sample size, and report the sample statistic and whether it supports your conjecture.

Inferential Statistics

In carrying out the binomial test and interval:

- define the population and parameter in words

- state your conjectured value about the parameter and what it signifies.

- state whether you suspected (before you saw the data) whether you thought the actual value of the parameter was higher or lower than this conjectured value. If you had no prior direction in mind, then you will calculate a two-sided p-value.

- state what a type I and a type II error would represent in this setting.

- discuss whether or not your measurements can be considered observations from a Bernoulli process or from a large population.

- calculate a binomial probability to represent the p-value corresponding to the direction of your conjecture. Include an interpretation of what this p-value represents.

- use Minitab to calculate a confidence interval to describe the plausible values of your population parameter.

- state your conclusions in context.

IV. Conclusion

Same guidelines as before. Pay particular attention to whether or not the conditions were satisfied for you to generalize your sample to the larger population. Also discuss whether or not the Bernoulli conditions were met and whether or not the p-value represents true randomness in the study or if the p-value is more fictitious, used to measure the amount of chance variability if there had been randomness (measures the uncertainty but really don’t think it is reasonable to generalize from your sample to your population). Make sure you include a critique of the study you did, as well as make suggestions for future studies.

Stat 320 – Mini-Project 3

Due Friday, June 10 or before

This project is to be completed individually. The goal is to design two separate studies. You do not have to carry out either study. You are to turn in a typed proposal for each design. You may submit your report via email. If you submit the report early, I will do my best to provide feedback to you prior to June 10.

You will be assessed on the correctness of your designs, whether they are appropriate for the research question proposed, whether they would have been feasible for a researcher to carry out, and creativity in your research questions and designs. For feasibility, it does not need to be feasible for you to carry out at Cal Poly, but for someone to carry out with more time and resources (e.g., comparing the mating patterns of African and European bees). You should concentrate more on finding a research question of interest to you and justifying why it is of interest, and designing an appropriate study to answer that question.

The requirements for the designs are:

One should be a randomized experiment and one should be an observational study with random selection.

One should involve comparing two groups on a quantitative response variable and one should involve comparing two groups on a categorical response variable.

You should consider how you would analyze the data obtained from your study using the statistical methods we have discussed this quarter.

The two topics may or may not be related.

In your report, make sure you:

State the research question for each design.

Provide extensive detail for each design. This design protocol should be detailed enough that I could hand it to someone and they would be able to carry it out exactly to your specifications.

Use appropriate statistical terminology (e.g., observational/experimental units, explanatory and response variables, randomization and random sampling, sampling frame, sample size).

Indicate the methods you would use to analyze the data both descriptively (numerical and graphical summaries) and inferentially (e.g., will you use small sample or large sample techniques and why).

State the conclusions you would draw should the difference in groups prove to be statistically significant. Clarify whether you would draw a cause and effect conclusion and to what population you would generalize the result. If you do not feel a cause and effect conclusion is warranted, suggest potential confounding variables. You should also indicate how you would decide whether this difference is of practice significance.