Stat 414 – Final Review Handout
Optional
Review Session: Sunday
Final
exam: Monday, 7:10-10:00am
Format of Exam: The exam will be cumulative with maybe a
bit of emphasis on applying what we learned before to new situations (e.g.,
logistic models, longitudinal data, cross-effects). The exam will mostly be
interpretation questions. Be ready to explain your reasoning, including in
“layperson’s” language. You can use 3
pages of (two-sided) notes. I will be giving you output rather than asking you
to run any models in R.
Advice:
Review
Quizzes/commentary, HW solutions/grader comments (including on submission
page), Lecture notes, textbook. Post
questions in the Final Exam discussion board in
Canvas.
Overview:
We have learned about
multilevel models because they are required for proper analysis of multilevel
data to account for the clustering/nesting in the design of the study. But basically, a multilevel model is simply a
regression model that includes the “level 2 grouping variable” in the model to
explore the adjusted associations. Adding the Level 2
could be done without learning any new methods by including the categorical
variable as fixed effects. Alternatively, we have seen how to treat them as
random effects, admitting that we don’t care too much about the specific observed
levels of this level 2 grouping variable but to model the population as whole
the groups are sampled from, and adding only one new parameter to the model
(intercept variance). So mostly we have
focused on the consequences of that assumption, such as shrinkage of the
estimates of the level 2 grouping variable’s “effects,”
intraclass correlation as a measure of the within-group correlation, and random
slopes as the parallel to assuming an interaction between level 1 variables and
the level 2 grouping variable. We also
were exposed to different estimation methods such as maximum likelihood,
leading to likelihood ratio tests. This
basic structure is easily extended to more than 2 levels and to generalized
models such as logistic regression.
Topics
since Midterm
From
Day 10 handout, you
should be able to
·
Use
graphs, context, and model equation to interpret an interaction in context
o
Explain
the nature of the interaction
§
Change
in slopes = change in effect of x1 on y depending on
value of x2
§
NOT
the same as x1 and x2 being related
to each other
o
Interpret
signs of coefficients
·
Write
out separate equations
o Be able to talk about why we don’t just
fit separate equations
·
Be
careful when interpreting “main effects” if have an interaction
o
Can
describe slope of x1 on y when x2
is at zero (or mean if centered)
·
Why
it’s useful to center variables involved in an interaction
·
Interpret
random slopes models (aka “random coefficients”)
o
Interpretation
as interaction across higher level units
§ e.g., Level 1 variables can have random
slopes at Level 2
·
Explain
the distinction between a random slopes model and fitting a separate equation
for each Level 2 group
o
Complete
pooling vs. Partial pooling vs. No pooling
·
Interpret
the standard deviation/variance of the slopes ()
·
Write
out the Level 1 and Level 2 equations for the random slopes model
o
Including
specifying error terms, and their distributions, and covariance terms
o
Do
be careful with indices
o
Thinking
of Level 2 equations as “intercepts as outcomes” and “slopes as outcomes”
models
o
Interpretation
of the model components
o
Generally include correlation between slopes and
intercepts (and slopes and slopes) in the model but can force to be zero
·
Compare
the random slopes to the fixed slopes models and decide significance of random
slopes
·
Compare
relative sizes of variance components in context
·
Interpretation/visualization
of variance component for slopes
o
95%
of slopes should fall within 2 SD of overall slope
·
Interpretation
of covariance/correlation between random intercepts and random slopes
·
Distinguish
between “random slopes” ( and “slope effects” ()
From
Day 11 (and Sec. 5.2), you should be able to:
·
Add
a Level 2 variable to explain variation in random slopes
o Inclusion of cross-level interactions
§ Interpretation
o Level equations (adding Level 2 variable
to equation for intercept vs. slope vs. both)
o Measuring change in Level 1 and Level 2
variances as percentage change
§ Could explain pretty much all of the Level 2 variation/a Level 2 variable can be
sufficient adjusting for the clustering in the study design
·
Explain
how random slopes models induce heteroscedasticity in the responses
o Variance as quadratic function of
o Minimized at
§ Does this occur within the range of x
values in the dataset?
·
Explain
how random slopes models different correlations among pairs of depending on the values involved
o Translating between and
·
Interpret
covariance/correlation between random intercepts and random slopes
o Distinguish between cov(y’s)
and cov(u’s)
o Sign of covariance and implications for
fanning in/fanning out of lines
§ Interpretation in context
o Translating between and
§ Recognizing which is reported in the
output
·
Interpret
the variance-covariance matrix output (marginal vs. conditional)
o Compare model predictions to observed
results
·
Distinguish
between Level 1 variance, Level 2 variance, and total variance
·
Explain
limitation of ICC in random slopes model
From
Day 12, you should be able to:
·
Consider
random slopes for multiple variables
o
Do
you want them to be correlated?
o
Can
increase complexity of model pretty quickly
o
Interpretation
of random effects correlations (and identifying pairs from output)
·
Determine
the number of parameters being estimated in a model
o
SD
and Var are just 1, Include covariances
From
Day 13 handout, you should be able to
·
Decide
when to use a logistic regression model
·
Interpret
an odds ratio in context
·
Explain
why we don’t often use linear regression models to model probabilities
·
Interpret
a basic logistic regression model
o Back transform intercept to predicted
probability
o Use exp(slope) as an odds multiplier
o Substitute into equation and back
transform to predicted probability
·
Continue
to consider multiple regression models and adjusted associations
No
Day 14 handout
From
Day 15 handout, you should know how to
·
Identify
the need for a multilevel logistic regression model (response variable is
binary rather than quantitative, with clustered data)
o Use a chi-square test to decide whether
the level 2 grouping variable is associated with the response variable (aka
significant level 2 variation in response)
o Fit a logistic regression model with
random intercepts
§ Same as adding a categorical variable
with lots of categories that we aren’t really all that interested in
individually but want to adjust for their impact on the response
o Compute an intraclass correlation
coefficient with a multilevel logistic regression null model
o Interpret the sign of a slope
coefficient in a multilevel logistic multiple regression model
§ Predicted probability as increasing or
decreasing
§ Adjusting for other variables
§ Slope as “subject specific” rather than
“population average” effect (average subject rather than averaged over all the
subjects)
o Interpret/visualize random slopes in a
multilevel logistic regression model
§ Keep in mind that changing intercepts
moves the model left and right, changing slopes changing the rate of
increase/decrease (how quickly it starts the S-shaped pattern)
o Write out the Level 1 and Level 2 and
composite equations for multilevel logistic multiple regression model
o Describe the variance component(s)
o Compare models
o Summarize models in context
From
Chapter 13, you should be able to
o
Identify
non-hierarchical models (imperfect hierarchies)
§
Lower-level
groups feed into different upper level groups
o
Interpret
a “crossed-effects” model
§
Multiple
sources of “random effects” on the same level
§ Interpret large/small random effects in
context
§ Interpret parameter estimates in context
·
Still
need to consider whether a variable is included in an interaction
·
Interpret
interaction as changes in slope/effect of other
variable
§
Variance
components, intraclass correlation coefficient combinations
§
Prediction
§
Random
slopes
From
Days 16/17 handout, you should know how to
·
Apply
multilevel models to longitudinal data
o
Repeated
measurements at Level 1
·
Identify
time independent (“invariant) vs. time dependent explanatory variables
·
Identify
Wide vs. Long format
·
Compute
percentage of Level 1 variation explained by changes over time as well as
changes explained by other variables after accounting for time dependence
·
Consider
different error structures (AR(1)) at Level 1
o
vs.
random slopes
o
Application
of variance/covariance equations
o
Comparison
to observed correlation matrix
·
Consider
different forms of association at Level 1 (e.g., quadratic, piecewise)
·
Use
graphs to suggest components that should be included in model
Some
reminders
·
Distinguish
between variables that have only level 1 variation, level 1 and level 2
variation, only level 2 variation
o
A
Level 1 variable can explain variation at Level 2 if the distributions (means)
of the Level 1 variable differ across the level 2 categories. Can also increase
the Level 2 variation if the associations are in “opposite directions.”
·
Discuss
random slopes as the interaction between a level 1 variable and the level 2
units/grouping variable
o
As
a proxy for meaningful Level 2 variables or could be replaced if have access to
meaningful Level 2 variables
o
Assumes
Level 2 random effects are not associated with Level 1 variables (no
confounding/have accounted for the relevant variables/model is appropriately
specified)
·
Can
test specific Level 2 variables (e.g., aggregated Level 1 variables) to decide
whether the Level 1 association between y and x and the Level 2
association between and significantly differ
o
e.g.,
does living in a more religious country have the same effect as being a
more religious person?
o
e.g.,
does being the type of family who lives in poverty vs not have the same effect
of a change in poverty for an individual child?
o
Treating
the Level 2 grouping variable as fixed instead of random is a way to adjust for
all possible observed and unobserved characteristics from Level 2 unit to unit
rather than the random effects model which adjusts for “units like these.”
· Interpretation of different model
components in context
o
Go beyond “variation in intercepts and slopes” but be able to
explain in context what the intercepts and slopes represent
o Including for categorical variables
o Including variance explained
o Including interactions
o Including main effects when have
interactions
§
When
need to “fix” other variables and when need to set them to zero or mean