Stat 320 -- Mini-Project 1
Designing and analyzing a study
Goal: To collect, describe, and analyze using the
methods of Chapter 1.
Teams: You are to work in teams of 2-3 people. It is up to
the members of the group to make sure everyone contributes equally. Plan
your schedules so that you will have time to work together on the project
outside of class. Teams should be formed and project topics selected by
April 4. You may be asked to share your proposals with the rest of the
class. You are also encouraged to share your ideas with me before you begin
collecting any data. Please start early so you have time to ask questions. You
should have your data collected by April 11. You may have time to work on
the data analysis in class April 12.
The Study: You are free to choose your own topic. You should
think of two groups that you can compare with one categorical variable through
an experiment or an observational study. Make sure you choose a topic for
which it is feasible to gather the data in a relative short period of time. The
question may be related to your major or some other topic of interest.
For example, you could observe men and women on campus to determine whether
they are left or right handed, or you could randomly assign people to take a
survey with two different wordings and see if they respond differently
depending on how the question is asked. Your study must obtain at least
10 observational/experimental units in each explanatory variable group.
Final Report: Due April 13. This should be a typed
report, written collaboratively by all team members. Your report should
be written as to other student researchers. Make sure it includes at
least:
I.
Introduction – Why did you choose
this topic? What did you expect to find? Have similar studies been
done elsewhere? Why should the reader be interested in your results and
continue reading?
II. Summary of Data Collection Methods – How did you collect the data? What were the
experimental/observational units? What groups did you compare, how did
you find them/form them? Was this a prospective or retrospective
study? Observational or Experiment? What
was your response variable? How were these variables measured? What
additional “controls” did you exert on the study? (E.g., did you only
observe people writing or did you take any behavior such as throwing a football
as indication of handedness?) Any “operational definitions”?
(E.g., did you pre-test any of the questions on a test group to see if the
wording was clear?) Did you have any problems with non-response or other
unexpected results? Did anything go wrong during the course of the
study? (Note: You can never give me too much detail in this
section!) In particular, there should be enough information that someone
else could replicate your study on their own based only on your description
(and hopefully improve upon it based on your suggestions below).
III. Analysis of Results – Include appropriate numerical and graphical summaries of your data,
including the two-way table. Write several paragraphs explaining what you
found in these data. Use both simulation (using the Java applet) and
Fisher’s Exact Test (using Minitab) to analyze your results, reporting both the
approximate and exact p-value (and include the output – you can make a screen
capture of the applet window using the Prnt Scrn key on the keyboard). Include a careful
interpretation of what this p-value tells you. Is the difference between the
groups statistically significant? What conclusions can you draw? Be
sure to refer back to the type of study conducted in explaining the scope of
your conclusions. Address both the question of causation and the question
of whether you believe your findings generalize to a larger population. (Note:
All computer output should be included in the body of the report. Make
sure all figures and graphs are clearly labeled.)
IV.
Conclusion – Summarize the results of
your study. What did you learn? Did the data behave as you
expected? Critique the methods used to collect the data. Is there
anything you would do differently next time? How might this affect the
conclusions of the study? What similar questions might someone chose to
investigate in the future to build on your results?
Previous Project Ideas:
Compare people arriving to a location with elevators
and stairs to see if one gender uses either mode of transportation more often.
Survey men and women to see if one gender tends to
bicycle to campus more often
Do freshmen call their parents more often then upper
classmen?
Are people more likely to agree if you randomly decide
to ask them “Are you happy with your roommate” or to disagree if you ask them
“Are you unhappy with your roommate?”
Are people more likely to support the war in
Are freshmen more likely to think they will change
their major at Cal Poly than transfer students?
Does listening to classical music while studying a
group of words improve whether subjects can pass a threshold level of recall?
Do self-identified chocolate lovers prefer the taste
of Ghiradelli chocolate to Nestle’s?
Are people more likely to loan you money for a phone
call depending on how you are dressed?
Are women more likely to respond if you sneeze near
them?
Does class time have any effect on caffeine
consumption and in what form do students get their caffeine, soda or coffee?
Tell people they are tasting
two different types of muffins and see if they are more likely to predict the
second taste of the same muffin.
Are people using their cell phone less likely to come
to a complete stop at a four way intersection?
Mini-Project 2 will apply methods from Ch. 3 but you might want to
begin thinking of topics now. Project 2 will involve taking a random sample
(in the true sense of the word) from a well-defined population and measuring a
categorical variable. Before you collect the data you should make a
conjecture as to the value of the population proportion.
(e.g.,
more than 60% of Cal Poly students own a cell phone, more than 1/3 of TV
commercials during the NBA playoffs last longer than 30 seconds, less than 75%
of San Luis drivers come to a complete stop at intersections next to campus, a
majority of college students can distinguish between the taste of Coke and
Pepsi, more than half of a random sample of products are more expensive at
Scolari’s than at Lucky’s).
The sample size should be at
least 30 and the population should be at least 20 times the size of your
sample. The type of study can be an experiment, an observational study,
or a survey. The key requirement will be that you randomly select the
observational units from the larger population. (Note: the sample does
not have to consist of humans. You should be very careful in how you
define your population.) You are free to choose your own topic(s).
The topic may be related to your major or another topic of interest. Make
sure you choose a topic so that it is straightforward to gather the data or you
have access to data from another class or professor. You may work with up
to two other people.
Final Report: Due May 10. This should be a typed
report, written collaboratively by all team members. Your report should
be written as if will be read by other student researchers. Make sure it
includes at least:
I. Introduction
Same guidelines as last time. You should describe the population parameter of
interest, an initial conjecture for its value (that makes sense in the context)
and whether you suspect the actual value is higher or lower (of just different)
than this conjectured value.
II.
Data Collection Methods
Same
rules as last time, remember to tell me everything, good and bad. Think
about designing a study protocol where someone else could mimic exactly
the same study that you carried out. In your discussion, be sure to
define your observational units, variable of interest, population of interest,
sampling frame (if applicable), and parameter of interest. Which type of
probability sampling method did you use (SRS, stratified, cluster,
systematic)? If you designed a survey, are there any potential wording
issues? Did you “field-test” the questions first? How did you
ensure confidentiality or take other precautions to ensure honest
responses? What was the response rate? How often did you have to
make repeat visits in order to obtain the observational units initially
selected? Are there any other potential sources of sampling or
non-sampling errors?
III. Analysis of Results
Descriptive
Statistics
You
will need to make choices as to which numerical and graphical summaries are
most relevant. Make sure you integrate the output into the body of the
report and include discussions of how you are interpreting the message in these
summaries. In your discussion you should fully describe your sample,
sample size, and report the sample statistic and whether it supports your
conjecture.
Inferential
Statistics
In
carrying out the binomial test and interval:
-
define the population and parameter in words
- state
your conjectured value about the parameter and what it signifies.
-
state whether you suspected (before you saw the data) whether you thought the
actual value of the parameter was higher or lower than this conjectured
value. If you had no prior direction in mind, then you will calculate a
two-sided p-value.
-
state what a type I and a type II error would represent in this setting.
-
discuss whether or not your measurements can be considered observations from a
Bernoulli process or from a large population.
-
calculate a binomial probability to represent the p-value corresponding to the
direction of your conjecture. Include an interpretation of what this
p-value represents.
-
use Minitab to calculate a confidence interval to describe the plausible values
of your population parameter.
-
state your conclusions in context.
IV. Conclusion
Same guidelines as before. Pay particular attention to whether or not the
conditions were satisfied for you to generalize your sample to the larger
population. Also discuss whether or not the Bernoulli conditions were met
and whether or not the p-value represents true randomness in the study or if
the p-value is more fictitious, used to measure the amount of chance
variability if there had been randomness (measures the uncertainty but
really don’t think it is reasonable to generalize from your sample to your
population). Make sure you include a critique of the study you did, as
well as make suggestions for future studies.
Stat 320 – Mini-Project 3
Due Friday, June 10 or before
This project is to be
completed individually. The goal is to design two separate
studies. You do not have to carry out either study. You are to turn
in a typed proposal for each design. You may submit your report via
email. If you submit the report early, I will do my best to provide
feedback to you prior to June 10.
You will be assessed on the
correctness of your designs, whether they are appropriate for the research
question proposed, whether they would have been feasible for a researcher to
carry out, and creativity in your research questions and designs. For
feasibility, it does not need to be feasible for you to carry out at Cal Poly,
but for someone to carry out with more time and resources (e.g., comparing the
mating patterns of African and European bees). You should concentrate
more on finding a research question of interest to you and justifying why it is
of interest, and designing an appropriate study to answer that question.
The requirements for the
designs are:
One should be a randomized experiment and one should
be an observational study with random selection.
One should involve comparing two groups on a
quantitative response variable and one should involve comparing two groups on a
categorical response variable.
You should consider how you would analyze the data
obtained from your study using the statistical methods we have discussed this
quarter.
The two topics may or may not be related.
In your report, make sure
you:
State the research question for each design.
Provide extensive detail for each design. This
design protocol should be detailed enough that I could hand it to someone and
they would be able to carry it out exactly to your specifications.
Provide extensive detail for each design. This
design protocol should be detailed enough that I could hand it to someone and
they would be able to carry it out exactly to your specifications.
Use appropriate statistical terminology (e.g.,
observational/experimental units, explanatory and response variables,
randomization and random sampling, sampling frame, sample size).
Indicate the methods you would use to analyze the data
both descriptively (numerical and graphical summaries) and inferentially (e.g.,
will you use small sample or large sample techniques and why).
State the conclusions you would draw should the
difference in groups prove to be statistically significant. Clarify
whether you would draw a cause and effect conclusion and to what population you
would generalize the result. If you do not feel a cause and effect
conclusion is warranted, suggest potential confounding variables. You should
also indicate how you would decide whether this difference is of practice
significance.