A Data-Oriented, Active Learning, Post-Calculus Introduction to Statistical Concepts, Methods, and Theory
Preliminary Content Outline, July 2000
Some Principles:
- Motivate with real data, problems
- Foster active explorations by students
- Make use of mathematical competence to investigate underpinnings
- Use variety of computational tools
- Develop assortment of problem-solving skills
- Experience process of statistical investigation over and over in new settings
- From data collection through analysis to inference/interpretation
- Stress that data collection design determines type of analysis and interpretation
- Emphasize themes of comparison, estimation, prediction
- Use simulations (tactile, technology) throughout
- Introduce probability "just in time"
Format:
Roughly half activities, half exposition
- Introduce probability through "detours"
- Designate activities requiring Calculus, Technology
- Identify routine exercises as "Practice"
- Maybe also "S" for simulation, "C" for conceptual?
Sequencing:
Change scenarios "one component at a time"
- Repeatedly model statistical process from data collection through inference
- Revisit important ideas often
- Emphasize concepts first, then techniques, then theory
- Outline:
- Start with comparing two categorical variables, small samples, experimental setting
- Then change to observational studies, rest same
- Then change to one-sample
- Then move to large samples
- Then repeat these scenarios for estimation rather than testing
- Move from categorical to quantitative variables, repeat all of the above
- Then move to issues of bivariate analysis, association, prediction
(The first course would probably end about here.)
- Then consider additional probability distributions, models, theory of estimation
- Then address theory of testing
- Conclude with linear prediction models
Chapter 1: Variation, Randomness, and Comparisons
Introduce idea of statistical significance in a setting of comparing experimental groups
- Simulation of randomization test for 2x2 table
- Segmented bar graph
- Hypergeometric probability distribution, Fisher's exact test
- Probability as relative frequency, sample space, equal likeliness, counting rules
- Random variable, expectation
- Odds ratio
(Scenario: categorical variables, two groups, small samples, experiment, comparison)
Chapter 2: Observation, Confounding, Causation
Compare/contrast conclusions to be drawn from controlled experiments vs. observational studies
- Confounding
- Experimental design principles: comparison, randomization, blindness, replication
- Importance, goals, properties of randomization
- Three-way tables, Simpson's paradox
- Types of variables
- Blocking, randomized block design
(Scenario: categorical variables, two groups, small samples, observational study, comparison)
Chapter 3: Sampling
Introduce idea of random sampling and its associated concepts, binomial model
- Population, sample
- Sampling, random sampling, bias, precision
- Binomial probability distribution, approximation to hypergeometric
- Binomial process, probabilistic independence, multiplication rule
- Simulations; shape, center, spread of distributions
- Test of significance: Reasoning, structure, terminology, notation
- Types of error, power
(Scenario: categorical variables, one group, small samples, comparison)
Chapter 4: Large-Sample Approximations
Study normal approximations to above analyses for large samples
- Continuous probability distributions, expectation
- Normal approximation to binomial
- z-test for single proportion
- z-test for comparing proportions
(Scenario: categorical variables, large samples, comparison)
Chapter 5: Estimation
Introduce concept of confidence, interval estimation; apply to situations studied thus far
- Idea of confidence as plausibility, long-term interpretation
- Properties of Cis, sample size determination
- Exact binomial CIs, empirical rule, z-intervals
- "Agresti" intervals with shrinkage estimator
- Duality with tests, empirically and theoretically (pivot)
- CIs for difference in proportions, odds ratio
(Scenario: categorical variables, estimation)
Chapter 6: Quantitative Variables
Repeat all of the above analyses (graphical, numerical, inferential) with quantitative variables
- histograms, stemplots, boxplots
- numerical summaries, minimization criteria, FNS
- review center, spread, shape, outliers, peaks/clusters/gaps, granularity
- comparisons, tendencies
- sampling distribution of mean, CLT
- t-statistic, distribution, comparison with z-procedures
- t-procedures
- checking normality, normal probability plots
- robustness
- transformations
- prediction as well as confidence intervals?
- matched pairs
- sign test, signed rank test?
- inference for median?
(Scenario: quantitative variables)
Chapter 7: Bivariate Data, Association, Prediction
Investigate concepts related to association and prediction, emphasize model basics (data = fit + residual), apply in specific settings
- scatterplots, association, correlation
- least squares regression, fit and residual, prediction, outlier, influence
- diagnostics, residual plots
- transformations
- logistic regression
- r x c tables, conditional probability, independence, chi-square analysis
- one-way ANOVA, residuals, multiple comparisons
Chapter 8: Probability Models, Distributions
Study "catalog" of common distributions as models, introduce estimation principles
- "catalog" of common distributions, parameter effects, modeling data
- Q-Q plots
- goodness of fit tests
- parameter estimation methods: maximum likelihood, method-of-moments
- parameter estimation criteria: unbiasedness, efficiency, MSE
- simulation-based methods: bootstrap, jackknife
Chapter 9: Theory of Testing, Decision
Investigate more theoretical aspects of testing and decision theory
- Neyman-Pearson paradigm, likelihood ratio tests, uniformly most powerful tests
- distributions of test statistics
- nonparametric test statistics?
- decision elements: actions, states of nature, loss functions, information
- Bayesian paradigm, Bayesian analysis of binomial, normal models?
Chapter 10: Linear Models
Study common structure, applicability of linear models
- Multiple regression
- Two-way ANOVA
- ANCOVA