Stat 414 – Review I

Due (preferably before Monday, 8am): In Canvas,

(1) post at least one question you have on the material described below.

(2) post at least one example question that you think I could ask on the material described below

Optional Review Session: Monday 7-8pm (zoom), Tuesday 7-8pm (zoom)

Optional Review Problems: review1probs.html

Format of Exam: The exam will cover Lectures 1–8, Quizzes 1-8, computer problems 1-8, Homeworks 1-4. There will be a “written” component and a “computer” component.

· For the written component, you should expect me to provide output and focus on interpretation of the models and concepts. Be ready to explain your reasoning, including in “laymen’s” language, as well as to write model equations and interpret variance-covariance matrices. You may use one 8.5 x 11 page of notes (hard copy).

· For the computer component, you will be expected to access a data set and answer questions about it. (So far example, I could ask you to calculate an ICC from output or using the computer after you decide what model to run.) The two components will be designed to take 75 minutes. The exam will be worth approximately 75 points. This portion is open notes. All reference material should come from this course only.

Advice: Prepare as if a closed-book exam. Review Quizzes/commentary, HW solutions/ commentary, Lecture notes. The first exam is really all about components of regression models: model assumptions, residual plots (standardized residuals), model equations ( vs. , slope vs. intercept, Residual error (MSE, SE residuals), Indicator vs. Effect coding, Centering, Adjusted vs. unadjusted associations (individual vs. group), Random vs. Fixed Effects, Basic mixed models.

Italics indicate related material that may not have been explicitly stated.

What you should recall from previous statistics courses

· Mean = average value, SD = typical deviation in data from average value

· The basic regression model, Y_i = + x_i,

o represents the part of the response (dependent) variable that cannot be explained by the linear function.

o Matrix form

§ measures the unexplained variability of the responses about the regression model

o Least squares (OLS) estimation minimizes SSError =

§ (p = number of slopes)

o Use output to write a fitted regression model using good statistical notation

o Interpreting regression coefficients in context

§ = expected vs. predicted mean response when all x = 0

· But often extrapolating unless the explanatory variable(s) have been centered

§ = expected vs. predicted change in mean response for x vs. x + 1

o The difference between the model equation (e.g., E(Y) = ) and the prediction equation (e.g.,

o Checking model assumptions

§ Residuals vs. Fits should show no pattern to satisfy Linearity (E(Y) vs. x)

§ Errors should be Independent,

§ Residuals should look roughly Normal (Y | x ~ Normal)

§ Residuals vs. Fits should show no fanning to satisfy Equal variance

· Var(Y | x) is the same at each value of x

o aka heteroscedasticity vs. homoscedasticity

§ Violation of model assumptions usually leads to bad estimates of

· Usually underestimating (overstating significance)

o Also assuming x values are fixed and measured without random error

o Also assuming you have the “right” variables in the model

o Unusual observations can be very interesting to explore

§ High leverage: observation is far from center of x-space

§ Influential: removal of observation substantially changes regression model (e.g. Cook’s distance explores change in fitted values)

· Combination of high leverage and/or large residual

· Use individual t-statistic or F-statistic to judge significance of association

o t-statistic is adjusted for all other variables in the model (two-sided p-values assuming the variable was the last one entered into the model)

§ Values larger than 2 are generally significant

§ For individual terms, generally t = (estimate – 0)/SE(estimate)

o overall F-statistic assumes all slopes are zero (compares full model to model with only intercept) vs. at least one slope differs from zero

o partial F-test assumes some slopes are zero (compares full model to reduced model)

§ especially helpful for categorical variables or after adjusting for a specific subset of variables

· Analysis of Variance (ANOVA)

o Traditionally thought of as method for comparing population means but more generally is an accounting system for partitioning of variability of response into model and error

o To compare groups, assume equal variances

§ s_p = residual standard error = root mean square error = “pooled standard deviation”

o SSTotal = == SSModel + SSError with df = n – 1

o F = MSGroups / MSError (Between group variation / Within group variation)

§ Values larger than 4 are generally significant

§ Equivalent to a pooled two-sample t-test

· R² = percentage of variation in response explained by the model

o 1 – SSError/SSTotal 1 –

o Explain similarities and differences in how R² and R²_adj are calculated (whether or not adjust for df)

· Changes that you expect to see when you make modifications to the model, e.g., adding a variable

o Explain that adding terms to the multiple regression model “costs a df” for each estimated coefficient and reduces df error

o If the new variable is “orthogonal” to current variables, coefficients won’t change but SSError will (hopefully) decrease

o If the new variable is related to current variables, “adjusted associations” can look very different from unadjusted associations

§ Interpreting coefficients MUST account for the other variables in the model

o Advantages/Disadvantages to treating a variable as quantitative or categorical

· Using MSError and R² to evaluate a model performance

o Slight differences in interpretation (e.g., typical prediction error vs. comparison to original amount of variation in response)

o Comparing models

· Explain that least squares estimates are unbiased for regression coefficients (and how we decide)

From Day 1, you should be able to

· Identify whether a study plan is likely to result in correlated observations

o e.g., repeat observations, clustered data, cluster-randomized trials

· Interpret sample standard deviation (even for a “no variable model”) as “typical prediction error”

· Explain that “least squares regression” (OLS) and minimizing the sum of squared residuals is just one possible method for fitting a model

o Explain the relationship between residual standard error and MSError

· Interpret intercept and slope coefficients in context

o Consider using something other than a “one unit” increase depending on scaling/context

· Explain that maximum likelihood estimation is another method for fitting a model that focuses on maximizing the ‘likelihood’ of the observed data

o Estimates of coefficients usually match but can change estimates of variability (e.g., residual standard error)

o (Day 2) REML estimates variance terms slightly different and usually looks more like OLS

o When sample sizes are large there aren’t major differences among the three estimation methods

o Be able to identify which method is being used by the software

· Determine the df used by a model

o OLS vs. ML/REML

o Including if use additional variance terms in the model (e.g., per month)

From Quiz 1

· Distinguishing Level 1 and Level 2, as well as variables measured at Level 1 vs Level 2

From Computer Problem 1

· How residual standard error is computed

o The idea of dividing by different numbers depending on estimation method

· How residual standard error relates to

From Day 2, you should be able to

· AIC and BIC are alternative “measures of fit” (balance between achieved likelihood and complexity of the model (df)) and can be used to compare models

· Define and evaluate the basic regression model assumptions (LINE) for inference with least squares regression (in context)

o Can also use residual plots to check for patterns/relationships

· Explain how measures sample to sample variation in values from different random samples

o for multiple regression; simplifies to for simple regression (1 EV)

o Explain that depends on , n and SD(X).

§ In multiple regression, also depends on the linear correlation among predictor variables (“variance inflation factor”)

· Carry out a Likelihood Ratio Test (LRT) to compare nested models

o Difference in log likelihood values (x2)

o Compare to chi-square distribution with df = difference in number of parameters between the two models

· Consider whether one model is “nested” in another

o Can you get from one model to the other by setting some of the parameters to 0?

o Can sometimes still get a “rough comparison” of not-nested models

From Quiz 2

· Short-cut decision for large |t| value > 2; Short-cut decision for large F value > 4

· What influences , the accuracy of our estimate of a slope coefficient

· When counting parameters, REML/ML count the parameter, OLS does not

From Computer Problem 2

· Centering changes intercept, not slope

· By hand likelihood ratio test = 2*(difference in log likelihood values) vs. chi-square distribution with df = difference in number of parameters in the two models

From HW 1

· Defining meaningful variables for a given context (e.g. time vs. speed vs. adjusting for track length)

· Suggest possible remedies for violations of basic model assumptions

o Transformations of y and/or x can improve linearity

o Transformations of y can improve normality and equal variance

o Including polynomial terms can model curvature

· Contrast the behavior between a log transformation and a polynomial model

· Understand impacts of centering a variable on interpretation, significance of model

· Identify benefits of centering a variable

o Interpretation of intercept

o Reduce multicollinearity of “product” terms (e.g., x and x²)

· Visually comparing graphs of the data to models, smoothers

o Don’t call all models “lines”

· Define multicollinearity and its consequences

· explanatory variables have a strong linear association

· Use VIF values to measure amount of multicollinearity in a model

o VIF = variance inflation factor, measures R² from regressing one explanatory variable on all the others

o Variables with high VIF will have inflated standard errors/misleading information about significance of the variable in the model

o Usually VIF values larger than 5 or 10 are concerning

o Remedies include reexpressing variables, removing variables, centering

§ Standardizing can also make regression coefficients more comparable by putting the variables on the same SD scale

From Day 3, you should be able to

· Interpret R’s Scale-Location plot as a measure of heteroscedasticity

o A linear association in this graph suggests a trend in the magnitude of the residuals vs. the fitted values

o Some methods (e.g., Bruesch-Pagan Test) provide a p-value for the strength of evidence of a linear association between the absolute residuals and x.

· Explain the principle of weighted least squares to address heteroscedasticity in residuals

o Var(Y_i

o Giving less weight in the estimation of the “best fit” line to observations that are measured with more uncertainty

o Use standardized residuals in residual plots to evaluate the appropriateness of the model

· Other remedies include

o Estimating the variances terms from the residuals

· Modelling a more complicated variance-covariance matrix (generalized least squares)

o WLS is a special case

o Can often use LRTs to compare model performance

· Predict how the fitted model will change when weights are applied

o Predict how prediction intervals will change

From Quiz 3

· Predicting how a weighted regression model might differ from the unweighted model based on which observations receive more/less weight

· Understanding the fixed vs. random terms in our model equation

o If start with E(Y) then “error term” is not there

o E(Y) represents the “expected” or “true population mean” response

o is the predicted response and the predicted population mean

· What influences , the accuracy of our estimate of a slope coefficient

From Computer problem 3

· Counting parameters, comparing models (remember the slope is the parameter, not the variable)

· Understanding different choices of weights

From Day 4, you should be able to

· Explain the idea of “sandwich estimators” for “correcting” the standard errors of the coefficients when heteroscedasticity is present

· Add a categorical variable to a regression model using k – 1 terms for a variable with k categories

o where is the k^th “group effect”

o Indicator variables change intercepts

o Interpret coefficients as changes in intercepts (e.g., parallel lines, at any value of x)

· Use either effect coding or indicator coding

o Determine the value of the coefficient not displayed in the output

o Interpret signs of coefficients (comparing to overall mean or to reference group)

o Interpretation of intercept (reference group vs. overall mean)

· Explain and evaluate model assumptions (normality, equal variance) with categorical data

From Quiz 4

· Effect (contr.sum aka “sum to zero”) vs. Indicator coding (contr.treatment)

o Seeing this in the code is one way to know which way the variables have been coded

· Interpreting slopes and intercepts with categorical variables

From Computer Problem 4

· Interpreting models

· Using residual plots to check the validity of a model

· Seeing the impact of the model when using weighted regression

· Remember to use standardized residuals to assess the equal variance assumption with weighted LS

· HC corrected standard errors are another way to “fix” the standard errors in the presence of heteroscedasticity. This again is like using the observed variance estimates to tweak the matrix that computes the standard errors.

From HW 2

· Utilize residual plots to evaluate validity conditions

o Identify which plot to use and what you learn from the plot

o The need for an interaction could show up as “curvature” in residual plots

o Also review study design

· Understand how different weighted regression models estimate variances

o e.g., given the model output, calculate the estimated variance

o Evaluate (differently) “significant improvement in model fit” and “improving the validity of the model”

· Pros and Cons of variable-transformations

o Only transforming y will impact “distributional assumptions”

o Transforming only one variable impacts only that variables linear relationship with y (e.g., if all relationships are nonlinear vs. just one)

o Explain that transformations may not be able to “solve” more complicated unequal variance patterns

§ Consider whether Var(Y) is changing with an explanatory variable (e.g., increasing variability with larger values of x)

§ Explain the principle of obtaining an estimate of when have unequal variance

· e.g., Var(Y_i

· This can impact judgement of significance of terms in model, predictions, etc.

· Distinguish between a confidence interval for E(Y) and a prediction interval for y

o Population regression line vs. variability about the line

o Identify a 95% prediction interval for an individual observation as roughly

o What does/does not impact the widths of these intervals

From Day 5, you should be able to

· Interpret an added-variable plot (residuals of y on x₁, … x_p vs. residuals of x_p₊₁ on x₁, … x_pto understand whether x_p should be added to the model and in what form

· Interpret the nature of the adjusted association

o From graph

o From multiple regression model

· Interpret intercept and slope coefficients for multiple regression model in context

o Multiple regression: After adjusting for other variables in the model (e.g., comparing individuals in the same sub-population)

§ Effect of x₁ on y can differ when x₂ is in the model

§ Adjusted and Unadjusted relationships can look very different

o Identify the other variables you are talking about

o Quote: We interpret the regression slopes as comparisons of individuals that differ in one predictor while being at the same levels of the other predictors. In some settings, one can also imagine manipulating the predictors to change some or hold others constant—but such an interpretation is not necessary. This becomes clearer when we consider situations in which it is logically impossible to change the value of one predictor while keeping the value of another constant (e.g., x and x²).

· Distinguish the meaning of sequential vs. adjusted sums of squares

· Carry out test of significance involving individual or groups of slope coefficients

o Stating appropriate hypotheses for removing the variable

o Use a “partial F-test” to test significance of categorical variable or a subset of variables

§ Null is all k-1 coefficients are zero; Alt. is at least one coefficient differs from 0

§ Compares increase in SSError with the reduced model to MSE for the full model to is see whether reduced model is significantly worse (df num = k – 1, df denom = MSE df for full model)

o Special cases of partial F tests:

§ One coefficient (equivalent to t-test as last variable entered into model)

§ All coefficients (“model utility test”)

o Use “anova(model1, model2)” to carry out partial F-test in R.

· Calculate and interpret an intraclass correlation coefficient from an ANOVA table

From Quiz 5

· Adjusted associations can be very different from unadjusted associations

· The for a model with two variables could be smaller than the sum of the values from the one-variable models if the predictor variables explain some of the same variability in the response

· Between group variation vs. within group variation

From Computer problem 5

· Interpretations when have more than one variable in the model

· Use a partial F-test to drop “some” term(s) from the model

· Plots of residuals vs. new x-variable(s) can help you decide whether the new x-variable has something to tell you about the response variable that the current model has not

o Going the further step to an “added variable plot” (by regressing residuals vs. residuals) tells you exactly what the adjusted slope coefficient will be.

· Bottom line: make sure you can make sense of the phrase “after adjusting for…”

From Day 6, you should be able to

· Use generalized least squares to add a correlation parameter to a OLS model to model within group correlation

· Explain the basic principles behind treating a factor as “random effects” rather than “fixed effects”

§ Identify some benefits of random effects models

· Degrees of freedom

· Observed “levels” are representative of a larger population (random sample?)

· Want to make inferences to the larger population

· Use of Level 2 units + Level 2 variables in same model

· Compute the ICC in terms of random effects (between group vs. total variation)

§ Standard errors reflect the randomness at the individual level and from the random effect (e.g., sampling error)

· Fit and interpret the “random intercepts” model

o Like adding the “grouping variable” to the model but treating as random effects

o “may be all that is required to adequately account for the nested nature of the grouped data.”

o “there is virtually no downside to estimating mixed-effect models even when If is small or non-significant because in these cases the mixed-effect models just return the OLS estimates)” (Bliese et al., 2018)

o Also known as “unconditional means” or “null” model or “one-way random effects ANOVA” (assumes equal group sizes)

· Explain the principle of shrinkage/pooling as it pertains to multilevel models

o where

o No pooling « weights = 1 (treating as fixed effects)

o Complete pooling « weights = 0 (no group to group variation)

o Identify and explain factors that impact the size of the weights/consequences/amount of shrinkage

o Determine when (sample size) an estimated mean will be closer to the group mean vs. the overall mean

From Quiz 6/Computer Problem 6

· Understanding the “weights” used in “partial pooling” and what makes them larger/smaller and how this changes the estimate of the group mean (between observed group mean and ‘overall’ mean)

· Q5: If adding/removing a variable from the model changes other coefficients, then that variable is related to other variables in the model (not orthogonal)

From HW 3

· Recognize grand mean centering (and impacts on model) vs. group mean centering

· Interpreting confidence interval for random components

· Compute the effective sample size of a study based on the ICC, (common) group size, number of groups

o Compromise between group size and number of groups

· Discuss impact on standard errors

From Day 7

· Write out the random intercepts model: where

· Discuss consequences of treating a factor as random effects

o Impact on model equations (now estimating one parameter = )

o Total Var(Y) = +

o ICC =

o Variance-Covariance matrix for the observations

o Impact on standard errors of coefficients (adjusting for dependence)

· Compute and interpret intraclass correlation coefficient as expected correlation between two randomly selected individuals from the same group

o Show how this correlation coefficient is “induced” by using the random intercepts model

· Be able to explain/interpret the variance-covariance matrix for the responses

A close-up of a test

Description automatically generated

Covariance

	Site 1 (beach 1)	Site 2 (beach 1)	Site 3 (beach 2)	Site 4 (beach 2)
Site 1 (beach 1)			0	0
Site 2 (beach 1)				0
Site 3 (beach 2)
Site 4 (beach 2)	0	0

Correlation

	Site 1 (beach 1)	Site 2 (beach 1)	Site 3 (beach 2)	Site 4 (beach 2)
Site 1 (beach 1)			0	0
Site 2 (beach 1)			0	0
Site 3 (beach 2)	0	0
Site 4 (beach 2)	0	0		1

· Assess the statistical significance of the variation among Level 2 units

o Stating null and alternative hypotheses

o Fixed effects ANOVA

o Confidence intervals for variance components

o Likelihood ratio test (MLE or REML approach)

§ Cut p-value in half?

· Understand that we don’t normally make the fixed vs. random decision based on a likelihood ratio test, but instead consider the study design/research question of interest

· Identify Level 1 and Level 2 from study design/research question

· Write out the statistical model for a “random intercepts” model (’s, u_j, s, )

o Interpret model components

o Define indices

o Interpretation of intercept

o Standard deviations vs. variances vs. covariances

o Composite equation vs. Two-level equations

· Identify number of units at each level

· Interpret R output (lme, lmer)

o How to use nlme::lme and lme4::lmer in R

o Is output reporting or ?

· Explain the distinction between and

From Quiz 7

· Think of the Level 2 units as a sample from a larger population with variance

· Using the normal distribution to predict other judges

· Understanding how individual effects are calculated and what factors impact the precision of those estimates (formula for )

From Computer Problem 7

· Keep track of lme vs. lmer

· Different ways of interpreting ICC

· Q5 (e): Want to estimate (and I’m hand-waving this vs. )

o is what I would use if all I had were 5 observations

o is what I use if I am assuming equal variances across groups and what to use the pooled standard deviation as the residual standard error

o is what I use with a mixed effect model (again, pooling across groups, taking group size into account, but with shrinkage)

From Day 8, you should be able to

· Add a Level 1 variable to the random intercepts model

o Interpretation of “adjusted” association, parameter estimates

§ Also remember to zero out (intercept) or “fix” Level 2 units (e.g., for a particular school)

o Visualize the model (e.g., parallel lines)

o Level 1 and Level 2 equations, composite equation

o R² type calculations

o Assessing significance (t-test vs. likelihood ratio test)

· ~~Explain the distinction between variation explained by fixed effects vs. by random effects~~

· ~~Explain the distinction between total variation and variation explained by random effects after adjusting for fixed effects~~

· Calculate percentage change in Level 1, Level 2, and overall variation after including variable

· If a variable coming into the model is related to the level 2 units, variation explained will depend on the nature of that relationship (similar to adjusted associations in regular regression models)

· Include a Level 1 variable and reinterpret coefficients, variance components as adjusted for that variable

§ Also remember to zero out (intercept) or “fix” Level 2 units (e.g., for a particular school)

· Distinguish Level 1 vs. Level 2 predictors

· Write out “level equations” and “composite” equation using appropriate symbols and indices

o Use estimated equations (fixed and random effects) to predict a response value

· Include a Level 2 variable to (ideally) explain variation in intercepts

o Interpreting an aggregated group mean variable (“if the group mean IQ increases…”)

o Slope coefficients with x and vs. slope coefficients with and

Model includes	Can explain variation
x	Level 1 and Level 2
	Level 2
x –	Level 1 (“pure Level 1”)
x and	Level 1 (only) and “emergent” effect at Level 2
x- and	Level 1 (only) and Level 2 (separated)

For example, if I have

Interpreting the slope when the group mean increase by one:

When ,

Slope = change in response when EV increases by one: 3 + 4 = 7

One the other hand, if I have

When ,

Slope = change in response when EV increases by one: 4

So the group mean can increase but the relative distribution of all the observations around the group mean stays the same.

· Use residual plots to analyze whether new model is more valid for the data in hand

o Look at graphs of model vs. data to assess appropriateness of model

· Define/interpret residual/investigate possible causes of large residuals

From Quiz 8

· Reading standard output for multilevel modeling

o Often: estimate (se)

· Relationship between models with group mean centered variable

From Computer Problem 8

· Interpreting R code (e.g., ifelse statements)

· Predicting relationships based on context

· Interpreting random effects (“Level 2 residuals”)

· Expected changes in variance components when adding Level 1 and Level 2 variables

From HW 4

· Interpreting all components of a multilevel model

o Interpreting standard errors of effects

o Adjusted vs. unadjusted associations

o Between vs. within group variability

· Distinguish different types of “feature scaling” (e.g., centering vs. standardizing vs. min-max scaling)

o What are the benefits to using them?

o How do they change interpretation?

Also

· Explain the difference between “within group” and “between group” associations

o Aggregating vs Disaggregating data

o Between group regression focuses on the association between and , the group means

o The within group regression focuses on the association between and within each group, assuming it’s the same within each group

o The between group regression slope may look very different (even opposite in direction) from the within group regression slope

o Regular regression can be written as a combination of the within and between group regressions, e.g.,

§ matches the within group regression coefficient

§ is the between group coefficient minus within group coefficient (do the difference between them)

Keeping track of variances

· = variation in response variable

· = standard deviation of explanatory (aka predictor) variable

· = variation about regression line/unexplained variation in regression model, variation in response at a particular x value

o aka s aka s_e = residual standard error = square of sum of squared residuals (“root MSE”)

· = sample to sample variation in regression slope

· = sample to sample variation in estimated predicted value. There are actually two “se fit” values, one for a confidence interval to predict E(Y) and one for a prediction interval to predict . The latter can be approximated with , but actually depends on etc