Stat 301 – Exam 1
Preparations
Review Problems: Review
problems (solutions)
See also: Investigation 1.16, 1.17, and 1.18,
Examples 1.1-1.3, Chapter 1 Summary
Required
by noon Tuesday: Submit
Review 1 questions (parts 1 and 2) in Canvas discussion boards
Optional: Review session Tuesday, 6:30-7:30 pm,
10-221
Exam Format: The exam will cover topics from Chapter
1 (Lectures 1-12; HW 1-3). The exam questions will be short-answer questions,
often with several questions on the same study (but you do not necessarily have
to answer (a) to try (b) etc.). There may be a few multiple choice but be
prepared to explain your reasoning. You
will have access to JMP, R, and the Applets and Data files page. You may use one page of your own notes (8.5 x
11, front and back). These are the most
relevant formulas:
Binomial: P(X = k) = E(X) = n
SD(X) =
Normal approximation for :
E() = , SD() =
Standard score: (observation
– mean)/std dev = (x-)/
One-sample z-test statistic:
One-sample (Wald) z-confidence interval: + z*
Adjusted Wald 95% confidence interval:
Sample mean: = Sample
standard deviation: s =
Also see
technology hints below. Keep in mind
that I am not trying to test you on the technology but you should be ready to
use appropriate tools to perform calculations more quickly and to interpret
supplied applet or JMP or R output. You may also be expected to use your
calculator or to set up calculations by hand (show the values substituted into
the formula).
You will be
expected to explain your reasoning, show your steps, and interpret your
results. Make sure, especially when using technology, that your solution
methods are clear!
The exam will
be worth approximately 50 points, so plan to spend one minute per point.
Study Advice: You should study from the text
(including study conclusions, chapter examples, and chapter summary), lecture notes
(ppt), graded homeworks, hw solutions (follow original HW link), and practice
problems. The quiz questions/solutions
should be accessible to you in Canvas. In studying, I recommend going back
through investigations, practice problems, and homeworks, without looking at
the solutions, then check your answers, then repeat. (Solutions to ISCAM investigations and
Practice problems can be found in Canvas on the Home page under Textbook
Resources.) I also strongly believe
working on Project 1 is a good study strategy.
Make sure you:
o
Review
the salmon-colored boxes and Chapter Summaries and Choice of Procedures table
(p. 129, but add to it, see technology guide below)
·
The hypergeometric distribution will not
be covered on Exam 1. We will work with
very large populations and use the binomial approximation to the hypergeometric
or the normal approximation.
o
Review
Examples 1.1, 1.2, and 1.3 at the end of chapter
o
See
HW notes below
Overview: The exam will focus on studies that involve
one binary categorical (i.e., yes/no) variable, where the data are a sample of
independent (repeat) observations from a random process (the randomness is in
the outcome) or a random sample from a large population (the randomness is in
which observational units are in your sample). We have studied two main types
of statistical inference:
•
Statistical significance, where the goal is to assess the degree
to which the sample data provide evidence against a null hypothesis and in
support of a research conjecture;
•
Statistical confidence, where the goal is to estimate a
population parameter with an interval of plausible values.
Big Idea:
We have a
categorical variable and we have gathered observations from a random process or
a random sample from a larger population. From that sample, we want to infer something about the underlying process or population. In other words, we want to use the statistic (which we calculate from our sample data) to test claims
about (test of significance) or to estimate (confidence interval) the value of,
the parameter (which we don’t know). To do this, we need to assess the amount
of “random variation” in our statistic, how much it varies by chance
alone. We can use simulation or the
binomial distribution or (often) the normal distribution to predict what that
variation looks like. If our model is
appropriate, then we know how far the statistic might be varying randomly from
the parameter.
From Day 1 and Investigations A and B
you should be able to:
·
Critique
and suggest suitable comparisons to answer a research question
·
Describe
the distribution of a quantitative variable (shape, center, variability,
outliers)
o
Interpret
the mean and standard deviation of a data set
o
Interpret
a histogram of a quantitative variable
o
Remember
to talk in terms of distribution not
just individual values
·
Anticipate
and explain variable behavior including outliers
·
Interpret
probability as a long-run proportion (under identical conditions)
·
Interpret
expected value as a long-run average
·
Use
simulation to estimate a probability
·
Distinguish
between “exact” probability calculations and simulated results
From Section 1.1 you should be able to:
·
Define
the observational units and variable of interest in a study
·
Classify
the variable as quantitative or categorical
·
Produce
a bar graph to summarize a categorical variable (by hand or with technology
using summarized data)
·
Calculate
a statistic to summarize a binary variable (e.g., sample count, X, or sample proportion, )
·
Define
a corresponding parameter of interest in the study in words (e.g., process
probability, )
·
Describe
how to carry out a tactile simulation to represent a “random choice” process
(e.g., with a coin or a die or a spinner) and to estimate a p-value
·
Describe
and interpret the results of a simulation
·
Use
the One
Proportion Inference applet to carry out a simulation to represent a
binomial process and to estimate a p-value
·
Set
up a binomial probability calculation given values for n and (show numbers plugged into equation, use P(X >
k) notation)
·
Calculate
an exact p-value using the binomial distribution (iscambinomprob or JMP
Distribution Calculator or One Proportion Inference applet)
·
Provide
a “layman’s” interpretation of p-value in your own words in the context of the
research question
·
Explain
what is meant by “statistical significance” and how it is assessed
·
Draw
a conclusion about the “random chance” hypothesis based on a p-value
·
State
null and alternative hypothesis in symbols and in words (including choosing
less than, greater than, or not equal to for the alternative)
·
Carry
out a binomial test of significance
1.
Define
parameter
2.
State
hypotheses (one or two-sided)
3.
Calculate
p-value (one or two-sided) using binomial distribution (iscambinomtest or JMP
Analyze > Distribution > Test Probabilities, or One Proportion Inference
applet)
4.
Make
a decision to reject or fail to reject the null hypothesis based on the
magnitude of the p-value
5.
Make
a final conclusion in context about
the research question
·
Interpret
a confidence interval as a range of plausible values for the parameter (those
not rejected by a two-sided test)
·
Use
technology to obtain a binomial confidence interval (iscambinomtest or JMP
Confidence Interval for One Proportion)
·
Define
Type I and Type II errors for a particular context
·
Know
that the level of significance () controls the probability of a Type I
Error
·
Be
able to also describe the consequences
of each type of error in context
·
Use
technology (iscambinompower or JMP(View > JMP Starter) DOE > Sample Size
and Power or Power
Simulation applet) to calculate power using the binomial distribution for a
given alternative value
·
Remember,
it’s a two-step process
·
Visual
·
Identify
the factors that affect power and how
·
Understand
idea of using technology to determine the sample size necessary to achieve a
stated power for a particular value of the alternative
From Section 1.2 you should be able to:
·
Explain
what is meant by the “sampling distribution of the sample proportion”
·
Determine
whether or not the normal approximation is reasonable (show details) for the
sampling distribution of the sample proportion (be able to label and sketch the
predicted distribution)
·
Determine
the mean and standard deviation for the sampling distribution of the sample
proportion
o
Apply
the CLT to predict the shape of a sampling distribution, including drawing a
well-labeled and partially scaled (3-5 values on the horizontal axis) sketch of
the distribution and shade the area of interest
o
Consider
probabilities as areas under a continuous mathematical probability curve
·
Calculate
and interpret the z-score for a
sample proportion
·
Carry
out a one-proportion z-test of
significance
1.
Define
parameter
2.
State
hypotheses (one or two-sided)
3.
Be
able to report and interpret the test statistic
4.
Check
whether the procedure is valid for the sample size used
5.
Calculate
a p-value (one or two-sided) using the normal approximation (R
iscamonepropztest, or JMP (Journal) Hypothesis Test for One Proportion, or Theory-Based
Inference applet)
6.
Make
a decision to reject or fail to reject the null hypothesis based on the
magnitude of the p-value
7.
Make
a final conclusion in context about
the research question
·
Apply
and explain the logic behind a continuity correction for the p-value
·
Calculate
power using the normal distribution for a given alternative value (p. 88)
·
Solve
for the sample size necessary to achieve a certain level of power
·
Use
technology to calculate a one-sample z-interval (R one propztest or JMP
(Journal) Confidence Interval for One Proportion, or Theory-Based Inference
applet)
·
Be
able to change the confidence level
·
Be
able to explain the components of the confidence interval formula (e.g.,
midpoint, width)
·
Determine
and interpret margin-of-error as the measured of expected random sampling error
·
Identify
the factors that affect the midpoint and width
·
Be
able to solve for the sample size necessary to achieve a desired margin of
error (p. 78)
·
Be
able to interpret confidence level in
terms of the reliability of the method
·
Apply
and explain the Adjusted Wald procedure for 95% confidence
·
Decide
when to use Wald vs. Adjusted Wald vs. Binomial and when they will be similar
·
Describe
and utilize the duality between
two-sided tests and confidence intervals
From
Section 1.3 you should be able to:
·
Define
the population, sample, sampling frame, statistic, and parameter for a
particular study context
·
Use
appropriate symbols to refer to parameters and statistics (mean, standard
deviation, proportion)
·
Decide
whether a sampling method is unbiased
by
·
Examining
the sampling distribution of the statistic, and determining whether it is
(approximately) centered at the parameter value
·
Considering
whether the sampling frame is complete and the selection method is random,
based on a description of the sampling process.
·
Be
able to conjecture with justification a direction for sampling or nonsampling
bias (likely to systematically produce over or underestimates of the parameter
value)
·
Know
the difference between “bias” and an unlucky sample
·
Produce
a simple random sample from a sampling frame, e.g., with GRN applet, Random.org
·
Describe
the concept of (random) sampling variability to a nonstatistician
·
Identify
the following sampling methods from a description: systematic sampling,
multistage sampling, stratified sampling
·
Explain
how they differ from a simple random sample
·
Suggest
sampling and nonsampling errors present in a study context (see Investigation 1.15;
Example 1.3)
·
Describe
the difference between statistical significance and practical significance
(Investigation 1.17)
·
Realize
that when we are sampling from a finite population, the binomial distribution
is an approximation
·
This
approximation is more valid the larger the population size compared to the
sample size
·
When
this is approximation is valid, we apply all the same techniques (e.g.,
simulation, binomial, normal) as earlier in the chapter.
·
When
this approximation is valid, neither the population size nor the percentage of
the population sampled influence our statements of significance or confidence
Technology Summary
·
To calculate/estimate a probability from
a binomial distribution knowing n and
o
One
Proportion Inference applet
o
JMP:
Distribution Calculator (Journal)
o
R:
iscambinomprob
·
To calculate a probability from a normal
distribution knowing mean and std dev
o
Normal
Probability Calculator Applet
§
Easy
to label horizontal axis
o
JMP:
Distribution Calculator (Journal)
o
R:
iscamnormprob
All
three methods allow you to find the probability above, below, between, or
outside values
·
FYI: To calculate a percentile from a normal
distribution knowing mean and std
(you know the probability and want to find the corresponding observation,
z-score)
o
Normal
Probability Calculator Applet
§
Enter
value in probability box and press enter or click mouse elsewhere
o
JMP:
Distribution Calculator (Input probability and calculate quantiles)
o
R: iscaminvnorm
o
You
can do something like this with the binomial distribution as well
·
FYI: To find critical values (z*) from a
standard normal distribution (mean = 0, SD = 1)
o
Normal
Probability Calculator applet, specifying the tail probabilities (1-C)/2 and
pressing Enter
o
JMP:
Distribution Calculator (Input probability and calculate quantiles)
o
R:
iscaminvorm
·
To calculate the exact binomial p-value
o
One
proportion Inference applet
§
Check
the Exact Binomial box
o
JMP:
Analyze > Distribution (one-sided alternative hypothesis)
§
Can
also use Distribution Calculator
o
R:
iscambinomprob
·
To approximate a binomial p-value
o
Simulation: One Proportion Inference applet,
especially when CLT does not apply
§
Make
sure run enough repetitions for simulation-based p-value
§
Can
also calculate exact p-value, exact binomial, or normal approximation
o
CLT: Theory-Based Inference Applet (one proportion)
§
Includes
graph (can paste in raw data) and Ho/Ha statements
§
Uses
normal approximation
§
Allows
continuity correction
o
JMP:
(Journal) Hypothesis Test for One Proportion (z-test)
§
Includes
Ho/Ha, p-value format
o
R:
iscamonepropztest
·
To calculate an exact binomial
confidence interval
o
JMP:
(Journal) Confidence Interval for One Proportion
o
R:
iscambinomtest
·
To calculate a one-sample z-confidence
interval
o
Theory-Based
Inference applet (one proportion)
o
JMP:
(Journal) Confidence Interval for One Proportion
§
If
you use Analyze > Distribution you get the “score interval” (p. 85)
o
R:
iscamonepropztest
With
95% confidence, can use the Adjusted Wald by specifying two more successes and
4 more observations.
·
To calculate power
o
Power
Simulation applet (simulation or exact or normal approximation)
o
JMP:
DOE > Sample Size and Power (binomial = Exact Clopper-Pearson)
o
R:
iscambinompower, iscamnormpower
Applets you don’t need to use on Exam 1
·
Descriptive
Statistics
·
Random
Babies (Just remember how to interpret “probability”)
·
Reese’s
Pieces or Colored Candies (are just special cases of One Proportion Inference
applet)
·
Simulation
Confidence Intervals (Just remember how to interpret “confidence”)
·
Sampling
Words (Just remember role of population size in our calculations)
Which distribution do I use to find a
p-value or a confidence interval?
·
You
have several options for categorical data (assuming you are sampling a binary
variable from a process or a large population)
o
Simulation,
although don’t have a confidence interval or power formula
o
The
binomial distribution
o
The
normal distribution if the
conditions for the CLT are met
Miscellaneous
•
Be
able to define a probability as a long-run proportion (whether it’s a probability
from a model, from a normal distribution, from a p-value)
•
Clearly
differentiate parameters from statistics (e.g., long-run proportion or
proportion of all adults)
•
Don’t
mix counts, proportions, percentages
•
Be
able to state hypotheses in symbols and/or words
o
Use
symbols correctly (e.g., know when you are using and when or 0)
•
Clearly
explain how you are finding your output (e.g., which command used)
•
Choice
of success is often arbitrary, just make sure you are consistent
•
Thinking
about your sample size can often help you define the observational units
•
Be
able to define the observational units and variable in our “null” distributions
(aka sampling distributions) vs. the sample distribution
•
A
calculation will seldom be the end of the question – always be on the look out
for “and interpret”
•
We
can now give better answers to some of the early “generalizability” questions
•
Always
put your comments in context
•
Be
able to sketch and label the predicted null distribution
•
Know
the difference between “predicted” and “theoretical” values (e.g., for mean and
SD, p-value)
•
You
won’t be asked to take derivatives but should be able to use the lessons
learned
o
SD() maximized at =
.5
o
Sample
size effects are larger than effects on SD() but exhibit diminishing returns
o
1/√n
is pretty good approximation of margin-of-error for 95% confidence for .
•
It’s
possible I will say find p-value or interval and if normal approximation is not
valid you should not use it
o
Remember
the sample size checks differ slightly between a test and an interval
o
For
proportions: Binomial and Adjusted Wald can be used with any sample size
•
Be able
to explain what is meant by “95% confidence” in your own words, in context,
without using the words confidence, probability, sure, or chance
•
Be
able to interpret a p-value in your
own words, not only evaluate
•
Know
the factors that affect test statistic, p-value, confidence intervals, and
power/types of error probabilities
•
Be
able to perform a continuity correction (for tail probabilities, “outside” and
“between”; counts and/or proportions)
•
Keep
in mind we never get evidence for the
null, only lack of evidence against it
o
Absence of evidence is not evidence of absence
•
When
making a choice between two options, you should argue both for one and against the other (sometimes you tell me one has one
property/advantage but don’t really tell me why the other does not)
o
Make
sure your explanations/justifications aren’t too “circular” (e.g., I have a
larger confidence level because I am more certain the parameter is contained in
the interval)
•
Be
able to evaluate the appropriate of a model, understand the assumptions underlying
a model
o
E.g.,
how to check the four conditions of a binomial model (e.g., is it ok to assume
the infants’ choices are independent of each other)
o
E.g.,
how to also check the sample size
conditions for a normal approximation to the binomial
•
You
won’t do a lot of hand calculations but may be asked to set up an equation
(e.g., show the values substituted in) or explain a property using the equation
(e.g., because n is in the denominator)
•
We
don’t always want to assume 0.5 in Ho/Ha.
The choices of hypothesized value and alternative direction are based
entirely on the research question, not anything about the observed sample data.
Advice:
•
Part
of your grade will be based on communication.
Be precise in your statements and use of terminology. Avoid unclear statements, and especially
don’t use the word “it”! Always relate your comments to the study context.
o
I
would also avoid “data,” “results,” “accurate”
o
Also
say the distribution of what and the
standard deviation of what
•
Show
the details of any of your calculations (including sample size checks)
•
Organize
notes for efficient retrieval of information/formulas
•
Don’t
plan to use notes too much
o
Prepare
as if exam were closed book/notes
o
Focus
on understanding, not memorization
o
Be
cognizant of time constraint
•
Expect
similar questions to what we answer in class every day, on HW
o
Also
be ready for “what if” questions (small changes that require you to conjecture
and explain more than perform additional calculations)
•
Be
sure to explain any assumptions you are making along the way
•
Be
prepared to think/explain/interpret
o
Not
just plug into formulas
o
Be
ready to explain process of how you would do calculations
§
E.g.,
p-value = Pr(X ≤ k), where X ~ Binomial(n, π)
o
Be
able to both make conclusions from a
p-value (evaluate) and provide a
detailed interpretation of what the p-value measures in context (interpret).
o
Be
succinct in your answers (using acceptable statistical terms helps with this,
but don’t use them incorrectly)
•
Be
ready to interpret computer output
o
You
may ask clarifying technology questions during the exam
•
Read
carefully
•
Be
sure to answer the question asked
•
Take
advantage of information provided
•
Relate
conclusions to context
•
Prepare
as thoroughly as you would for a closed-book exam
o
Re-work
in-class investigations
o
Re-work
HW questions
o
Work
through examples
o
Re-read
wrap-up sections
o
Come
to Tuesday’s class prepared with questions
o
Bring
questions to office hours, Canvas discussion boards