STAT 391 Home

Statistics for Data Science
STAT 391 Spring Quarter 2022

Home

Course Description

Syllabus

Books

Class mailing list

Assignments

Handouts

Other resources

UW Statistics

UW ACMS Program

UW CSE

Syllabus

Examples of applications. Probability vs. statistics.
Estimating a distribution
Models, likelihood and Max Likelihood estimation. Estimation of discrete distributions. Sufficient Statistics.
Estimation of small probabilities.
Parametric density estimation; estimating a Normal distribution (uni-variate) and other named distributions.
Maximizing the log-likelihood by gradient ascent and the logistic density.
Kernel density estimation.
Mixture models.
Evaluating models with statistics and other statistical decisions
Bias and variance. The bias-variance trade-off. Cross-validation.
Model selection for parametric models: BIC, AIC.
The bootstrap.
Confidence intervals.
Statistical estimators as random variables: examples from parametric estimation. Expectation, variance, asymptotic normality (consequence of the CLT) for various ML estimators.
Statistical decisions: using Bayes rule and conditional probability for statistical reasoning; statistical decisions with costs.
Hypothesis testing: concepts and simple examples; the Likelihood Ratio test
Prediction
Prediction: linear regression.
Prediction: classification. Generative (likelihood ratio) vs. discriminative. The logistic and nearest neighbor classifier.
Optional topics (if there is time)
Dimension reduction: Principal Component Analysis.
Clustering and the EM algorithm.
Streaming data. Active and passive data collection. Experiment design.
Hypothesis testing and model selection.
Models for dependent data: sequences, networks, spatial data.