Stat 301 – Final Review
Optional Review Sessions: Sunday, 4-5:30pm, 10-215
Office Hours: Monday 10-12pm, Tuesday 10-11am, 1-2pm,
Wednesday 1-3pm, Thursday 10-12pm
Final Exam:
The exam will be in lab classroom (10-215).
3pm section Monday 1:10-4pm; 2 pm
section Friday 1:10-4pm
Format: There will be a 30-45 minute, closed
book multiple choice portion on basic concepts.
Once you submit that portion, you may use three pages of notes, calculator, JMP, R, applets. Be ready to interpret output, explain
processes, carry out analyses with JMP/R/applets, justify conclusions, and
explain reasoning.
The final exam
will be cumulative, including a comparison of methods across chapters. You
should focus on the entire statistical process: How do we design a study to achieve
particular research goals? How do we
describe the data we have? How do we test claims about population parameters or
processes and/or estimate parameters?
How do we state our final conclusions, especially after considering how
the study was conducted? You should also
think about the reasoning behind the statistical methods, such as
standardizing, significance, and confidence in general.
Advice:
Understand and be able
to apply the procedures first, then worry about more subtle issues and review
the how and why behind the development of the procedure. Also, for all the procedures, know what must
be done by hand and what can be done on the computer. Review the Case Study
(trying the questions from scratch) as well as the “final exam multiple choice
practice” in Canvas.
From Section 4.1 you should
know that
· With
a quantitative response variable, you can create an “exact randomization”
distribution if you list all possible random assignments to the two groups,
identify an appropriate statistic and calculate for each possible random
assignment, count how many of the random assignments give a statistic at least
as extreme as the one you observed, and calculate a p-value
From Section 4.2 you should
know how to
·
Create numerical and graphical summaries comparing two groups for
a quantitative variable
o Compare distributions in terms of shape, center, and spread (citing
appropriate numerical evidence when available)
o Including what is meant by
“variability” in a distribution and different ways it can be measured
o Make sure your comments are
always in context (including measurement units)
o Consider and possibly
explain outliers
·
Predict the behavior of the sampling distribution of the
differences in two sample means from random sampling
o Including whether it should be
normal or approximately normal
o Reasoning behind the
standard error formula (adding variances) but should
make sense to you that it’s larger than the individual variances
·
Carry out a two-sample t-test
for the difference in population means using
technology (JMP, R, TBI applet)
o State null and alternative
hypotheses
o Assess validity of the procedure
o Raw data vs. Summary
statistics
o Interpret results, including interpretations of test statistic and p-value (in
terms of random sampling)
·
Factors that influence p-value and confidence interval midpoint
and width
·
Determine and interpret a two-sample t-confidence interval
o Make sure confidence level,
mean, variable (context), and “direction” are clear in interpretation
From
Section 4.3 you should be able to
·
Set up and explain the reasoning process behind a randomization
test
o How models the null
hypothesis being true/random assignment in the study
o How to simulate the null
distribution (random assignment vs. random sampling)
o How to use the generated
null distribution to calculate an empirical p-value (one or two-sided)
o Provide a detailed
interpretation of what the p-value is measuring in the context of the research
study (and random assignment)
o Making a decision about the
null hypothesis based on the p-value
o Be able to do this for
difference in sample means, sample medians, or other sample statistics
·
Summarize study conclusions in terms of significance, estimation,
causation, and generalizability
·
Check the technical conditions to assess the validity of the t procedures
·
Carry out two-sample t-tests
for comparing two long-run treatment means using technology (JMP, R, TBI
applet)
o Consider alternatives if
technical conditions are not met (e.g., randomization test, transformations)
o Be able to test hypothesized
differences other than zero
·
Calculate (using JMP, R, TBI applet) and interpret two-sample t-confidence intervals for the
difference in long-run treatment means
o Explore a data
transformation for improving the validity of a t-procedure
o Interpret a confidence interval
after a log transformation in terms of the original units (multiplicative
change in median)
·
Explain the effects of the difference in sample means, sample
sizes, and within group variability on the p-value, confidence interval, and
power
From
Section 4.4 you should be able to:
·
Distinguish
between matched pairs designs and two-independent “samples” (or completely
randomized) designs
o Consider benefits and disadvantages
(including feasibility) of the two types of designs
o Be able to describe how to set up a
matched-pairs design, including randomizing the order of the treatments if
repeated observations
·
Determine
whether a data collection plan necessitates a matched pairs analysis
·
Create
numerical and graphical summaries comparing matched pairs data by calculating
and examining the differences
·
Carry
out and interpret a matched pairs test simulation
o Logic behind it
o Interpreting the results
·
Use
technology (JMP, R, TBI applet on the differences, Matched Pairs applet) to
carry out and interpret a matched pairs t-test
·
Use
technology (JMP, R, TBI applet, Matched Pairs applet) to calculate and
interpret a matched pairs t-interval
·
Use
technology to carry out and interpret a sign test with paired data
o Exact binomial and/or normal
approximation
·
Set
up a two-way table with paired categorical data (the two treatments are the two
variables)
·
Use
technology to carry out and interpret McNemar’s test
o Exact binomial and/or normal
approximation
From HW 7
·
Be
able to distinguish/explain bootstrapping vs. sampling from a finite
(hypothetical) population
o
Key
goal is estimating the sample to sample variation in the statistic
o
Center
of bootstrap distribution is at observed statistic
o
Can
compare to theoretical results, but latter is only available for certain
statistics
·
Using
bootstrapping to compare two samples
o
Bootstrapping
from each sample separately vs. pooling together the samples first
·
Be
able to anticipate/explain the shape of distribution of a variable (e.g., why
not surprised ages at death were skewed to the left)
·
Keep
in mind that a confidence interval doesn’t really give you “additional
evidence” but it’s a different way of presenting the same evidence
·
Keep
in mind that we never can use the procedures in this course to establish
evidence for the null hypothesis
·
Remember
to back up your statements describing/comparing distributions with appropriate
numerical summaries
·
Be
able to distinguish the standard deviation of the sample, the pooled standard
deviation for two samples, and the standard deviation of the distribution of
the difference in sample means
·
We
can easily test non-zero values for the hypothesized difference in means
o
How
to do this with a t-test
o
How
to represent this in a simulation (e.g., need to remove the treatment effect,
then groups should be equivalent so shuffle, and then add the treatment effect
back)
Keep in mind
·
When
comparing distributions, remember to cite your evidence if you think there is a
difference in the groups. In particular, tell me what you see in the summary
statistics (e.g., a higher mean) that leads to your conclusion (e.g., sleep
deprived subjects tend to have lower improvements)
·
Remember
that the confidence level refers to
the reliability of the method – how often, in the long run, random samples will
produce an interval that succeeds in capturing the population parameter
·
Remember
to think about/decipher the direction of subtraction used by the technology
·
Why
is variability in the data an important consideration and how can we reduce it?
·
We
can use a two-sample t-test even when
the sample sizes are small if we have reason to believe the population
distributions are themselves normally distributed. You can try to judge this,
especially if you don’t have past experience with the variable, based on graphs
of the sample data. If the sample data
looks plausibly normally distributed (normal probability plots are a useful
tool for helping this judgment), you can cite this as evidence that the
population distribution is normally distributed. If you aren’t sure, then use a
simulation-based analysis instead.
·
Keep
in mind the two-sample t-test only
compares the two means (vs. other aspects of the distributions or other
measures of center)
·
Try
to avoid the word “accurate” without explaining exactly what you mean by it.
·
Try
to avoid use of the word “group” but clarify if you mean the sample or the
population or the treatment in general
·
Avoid
use of the word “it”
The Cumulative Component (also see old
Review handouts)
Things to
remember include:
·
Identifying
observational units and defining variables, samples vs. populations vs. what if
(sampling/randomization) distributions, parameters vs. statistics, explanatory
vs. response variable, bias vs. precision, random assignment vs. random
sampling (including goals)
·
Experiments
vs. Observational Studies
o How to design a randomized experiment,
How to properly select a sample
o Scope of conclusions depending on how
study was conducted (Can you draw a cause and effect conclusion? Can you
generalize to a larger population?)
o Sampling errors, nonsampling errors, and
random sample errors (and which of these are measured by the “margin of
error”?)
·
Describing
and comparing distributions of data
o Categorical: segmented bar graphs,
conditional percentages, difference in proportions vs. relative risk vs. odds
ratio (and how to interpret)
o Quantitative: shape, center, and spread,
stemplots, boxplots, histograms, dotplots, resistance of median and IQR
o When describing distributions, if you
have access to numerical summaries, use them to support your claims
·
How
to interpret probability
·
How
to carry out a test of significance
o About a population proportion and/or
population mean and/or treatment effect and/or difference in population
proportions and/or difference in population means
·
Make
sure you can state Ho and Ha in symbols and in words
o One-sided vs. two-sided alternatives
o Which technical conditions apply and how
to check them and what they tell you
·
e.g.,
proportions: n >
10 and n(1-) > 10, means: n > 30 or normal population
o Interpretation of test statistic (if
appropriate)
·
General
form: (estimate-hypothesized)/(standard error of estimate)
o Ideas and distinctions of sampling
distribution and randomization distribution
o How to calculate and/or approximate
p-value
o How to make a decision based on the
p-value and level of significance
o How to interpret the p-value
·
Source
of randomness, choice of statistic, observed result, direction, null hypothesis
o Factors that affect the size of the
p-value
o Defining (and stating the consequences
of) Type I and Type II Errors in context (including direction of Ha)
o How to determine the probabilities of a Type
I Error and of a Type II Error and Power
·
Type
II/Power is for a particular instance of alternative hypothesis
o Factors that affect the probability of
Type I and Type II Errors, Power
·
How
to calculate and interpret a confidence interval
o General form: estimate ± (critical
value)×(standard error)
o Interpretation: level, parameter,
context (with differences/ratios, include “direction”)
·
Clarify
larger population/process
o Interpret confidence “level” (separate
from interpreting interval)
o How to solve for the sample size
necessary to obtain a specific margin of error for a stated confidence level
·
Duality
between intervals and tests: Any parameter value not contained in a C% CI will
be rejected by a two-sided test at
(100-C)/100 significance level
·
Describe
the difference between statistical significance and practical significance (is
it a meaningful difference in context)
·
Calculating
p-values for Fisher’s Exact Test and/or binomial process (when ok to do)
·
How
to decide which procedure you should use (quantitative or categorical data, one
or two populations, Fisher’s Exact Test vs. binomial vs. normal vs. t)
·
For
validity of “theory-based” procedures, I tend to worry less about the
randomness condition and more about the sample size condition. The randomness condition is more important to
scope of conclusions.
· Be able to get t* and z* values using technology use 2 as the approximate
multiplier with 95% confidence
·
Make
sure you know where ()s go in prediction interval standard error
See summary tables (including on
technology) and end of chapter examples
Remember that mini-project 3 is due on
Wednesday of Finals Week
Some
big picture stuff
What is Statistical Inference?
The population parameter and the sample statistic summarize the same
variable. The population parameter summarizes the variable for the population,
which is what we want to know, e.g. or
1-2. However, we can’t observe the whole
population so we don’t know what the parameter value really is. However, we can measure the variable on a
sample or randomized groups and compute a sample statistic, e.g. or 1 – 2. The question is what can we infer about the parameter based on this
statistic? Because these statistics follow a null distribution/regular pattern
due to the randomness in the study design, we can estimate or calculate
probabilities of different values of the statistic occurring. Different statistics follow different
distributions, but once we know which distribution we should use, we can make
conclusions about the value of the parameter, e.g., it’s in some interval or we
have evidence that it is not a particular value.
Null Distributions
If we specify a
value for the population parameter, we can take (or simulate) lots of samples
from this population or lots of random assignments and calculate a statistic
for each sample/random assignment. This
allows us to examine the behavior of the statistic so we can discuss the shape,
center, and variability of this “null” distribution. For example, what types of values do we
expect the statistic to have, how far away might the statistic stray from the
hypothesized value of the parameter?
Simulation vs. “Large sample”
(Theory-Based) procedures
In almost every
case we have seen two different ways to approximate the p-value: simulation and
a mathematical model. We are considering the simulation approaches to always be
valid. The mathematical models are only appropriate if the “validity
conditions” of the theory-based approach are met. The advantage of the mathematical models is
we can easily get a confidence interval as well. So you may want to consider the mathematical
model way first but then if the validity conditions aren’t met, use the
simulation approach.
Confidence Intervals Estimate
population parameter
The goal of a
confidence interval is to get a range of plausible values that we think the
population parameter could be equal to.
To do this, we use the sample statistic and a measure of the sampling
(or shuffle to shuffle) variability of the sample statistic. This lets us form an interval around the
sample statistic that should contain the population parameter. Note, we are trying to contain the population
parameter in the interval, not the data and not the sample statistic. In fact,
the sample statistic better be the midpoint (center) of the interval.
Tests of Significance Test
claim about population parameter
The goal of a
test of significance is to make a decision about the population parameter. Here are the steps we use:
1) Define the
parameter(s) of interest. (Should also
be able to define the OUs and variable)
2) Specify the
hypotheses (e.g. H0: =1/3,
=50, 1- 2 =0, or no relationship between variable 1 and variable 2 in population)
Always in terms of the population
(parameters) because that’s what is unknown and what we are trying to make
statements about (take
off the hats!)
The null hypothesis is the “dull
hypothesis” or the “ho-hum hypothesis”
The alternative hypothesis specifies
something interesting (“a-ha!”)
One or two-sided (decide
based on wording of research question)
3) Check the validity
conditions, sketch the null distribution of the (test) statistic assuming H0
is true, and identify the appropriate test procedure by name
If
the validity conditions are not met, use a randomization-based (simulation)
method instead
4) Compare the data
observed in the sample to what’s “expected” from H0. Find the
p-value=probability of observing a value of the statistic as extreme or more
extreme when H0 is true.
Know how to get the computer to give
you the appropriate one or two-sided p-value
5) Draw a
conclusion in context
Decide
to reject or fail to reject Ho
Reject if p-value, synonymous with saying result is
“statistically significant”
Make
conclusion about research question of interest (back to English)
If we
repeatedly took different samples or random shuffles and calculated the value
of the statistic for each sample, the p-value indicates how often we would
expect to see the statistic value that we actually did observe, or one more
extreme, when Ho is true. If
the statistic value is very unlikely (so small p-value) we stop believing H0
(recall the loaded dice example). We can compare to the significance level as a benchmark to decide whether the p-value is
“too small.”
T vs Z With proportions, our observations consist of “yeses” and
“nos” for each observational unit in the population. A picture of this population is simply a bar
graph. In particular, we don’t worry
about its shape or variability. We will always consider approximating the null
distribution of the sample proportions with the normal distribution. Thus, we
never worry about using the t
distribution with proportions. With
means, the t distribution is used to
take into account the extra variation we will see in the null distribution if
we also substitute the sample standard deviation, s, into the equation. The
key is that both a z-statistic and a t-statistic have “standardized” our
observed statistic onto a comparable scale.
Population vs Variable vs Parameter A population is a group of objects, a
variable is what we measure about the objects, a question (e.g., height). The observational units are the objects we
measure (e.g., buildings, volleyball players).
You need to be able to decide how many populations you have and how many
variables, e.g., are you measuring two different things/answering two different
questions about the objects (e.g., height and age); are you measuring the same
thing on two different groups (e.g. heights of men and heights of women). Parameters are numbers, we just may not know
their exact numerical value, that describe the population (e.g., the average
height of all buildings, the average age of all volleyball players).
Independence/Matched Pairs We can also assess the “independence”
between samples to justify a two-sample procedure. This is not the same as the independence
between variables. Instead we are making sure the responses of one group are
not influencing or related to the responses in the other group. If they are, then a better analysis is to
take that dependence into account (e.g., a “paired t” procedure).
Independence/Association First, remember that we talk about independence/association
between two variables. We don’t talk
about the outcomes of the variables or levels of the variables, but the entire
variable. Two variables are associated
if they are related to each other, that is, if knowledge of one gives us information
about the other.
Other notes:
·
With
bar graphs, always use percentages as the vertical scale (instead of just
number of)
·
Remember,
compare two or more populations OR examine the association between two variables.
·
A p-value is not the probability of a null
hypothesis or a conclusion being true.
·
Remember
your p-value allows you to make a conclusion about whether there is evidence
again H0 or not. We can’t say
“there is strong evidence of no association” because we assumed no association
in the calculations/simulation. So all
we can say is “there is not strong evidence on an association.”
·
Make the link between your p-value and your decision
explicit. Don’t forget to then make a
conclusion in context.
Question Translations If the question asks you to
describe/compare
the distribution(s) of a categorical variable |
Look at
(conditional) proportions |
describe/compare
the distribution(s) of a quantitative variable |
Shape,
center, spread Use mean,
median, SD, IQR if available |
comment on
“statistical significance” or “strength of evidence” |
Consider the
p-value |
estimate “how
large the difference is” or “plausible values for the parameter” |
Consider the
confidence interval |
comment on
generalizability |
Consider the
data collection methods and specify a reasonable population |
comment on
causation |
Consider whether
you have a randomized experiment and statistical significance |
describe a
confounding variable |
Specify a
variable and argue how it might differ between the explanatory variable
groups and relate to the response variable |
describe a
parameter |
Specify the
number (e.g., mean or proportion or slope), the variable (e.g., how you are
defining success), and the population (don’t worry too much at this point
about whether it’s a reasonable population) |
interpret a
p-value |
Begin the
sentence “the probability that…” or “the proportion of …” and put your answer
in context of the problem (e.g., what source of randomness are we are
modelling? what statistic are you talking about, what value was observed,
what the null hypothesis specified in context, what do you mean by “or more
extreme”) |
interpret a
confidence interval |
Begin the
sentence “I am 95% confident that <<parameter>> is in (XX, XX)”. Clarify
parameter, population, context If an
interval about a difference, clarify which population parameter has a higher
value: I’m 95% confident that <<>> is XX to XX (times) (larger,
smaller) than <<>> |
Interpret the
confidence level |
Talk about
the reliability of the method, if you repeated the process for different
samples, what percentage of the resulting intervals would succeed in
capturing the parameter |
draw a
conclusion from a p-value (evaluate a p-value) |
Comment on
whether the p-value should be considered small, reject or fail to reject the
null hypothesis, and restate the conclusion you are going with in the study
context |
calculate a
confidence interval by hand |
Use 2SD
short-cut or TBI applet |
state
hypotheses |
Probably want
both null and alternative. Could ask for
you to do this in words and/or in symbols. Make sure you
are clearly talking about the population parameter and in context |
identify the
procedure |
Name the test
you would use (e.g., one proportion). Also be prepared to describe a
simulation process you could use to estimate a p-value (e.g., flip a coin X times,
shuffle X blue and X green cards X times) |
comment on
validity conditions |
Consider the
sample size condition for the relevant procedure as on the Overview of
Statistical Procedures handout |
If the question
asks you to calculate a p-value
|
Simulation |
Exact |
Theory-based |
One proportion |
One
proportion applet |
Random
sampling: Binomial distribution (Or
hypergeometric if sampling from a finite population) |
one-sample z-test (need at least 10 successes and
at least 10 failures) – One proportion or TBI applets |
One mean |
Random
sampling: not really, need a population to sample from where the null
hypothesis is true (Sampling from Finite Populations applet), could use
bootstrapping |
Not unless
could list out each possible random sample from the population and calculate
the statistic for each |
one-sample t-test (need n > 30 or symmetric population) – TBI applet or JMP or
R |
Two proportions |
Random
sampling: independent random samples from binomial process (applet) Random
assignment: Analyzing two-way tables applet |
Random
sampling: no (but can fix both margins and approximate with Fisher’s) Random
assignment: Fisher’s Exact Test (hypergeometric distribution) |
two-sample z-test (need at least 5 successes and
at least 5 failures in each group) – Analyzing two-way tables applet or TBI
applet or JMP or R |
Two means |
Random
sampling: not really, need populations to sample from that have the same population
mean, could use bootstrapping or applet Random
assignment: Comparing Groups (Quant) applet |
Random
sampling: no Random
assignment: not really, probably too many different random assignments to
list out/ find the statistic for each |
two-sample t-test (need both n’s > 20 or
both populations normal) –TBI applet or JMP or R (can also use Comparing
Groups (Quant) applet |
Matched pairs |
Quantitative:
Matched pairs applet Categorical:
sampling from binomial with n =
number of differing responses |
Not really Exact
binomial |
One-sample t-test on differences (need at least
30 differences or normality of differences) z-test for binomial (need at least 10 successes and
failures) |
Also expect
questions like – how would this (e.g., p-value, margin of error, conclusion)
change if you did this (e.g., change sample size, change hypothesized value,
confidence level), as well as more conceptually-based questions (e.g., what
does it all mean, explain the reasoning, what is this number measuring).