Statistics for Data Science
STAT 391 Spring Quarter 2021

Home

Course Description

Syllabus

Books

Class mailing list

 

Assignments

Handouts

Other resources

UW Statistics

UW ACMS Program

UW CSE

Syllabus


  • Examples of applications. Probability vs. statistics.
    Estimating a distribution
  • Models, likelihood and Max Likelihood estimation. Estimation of discrete distributions. Sufficient Statistics.
  • Estimation of small probabilities.
  • Parametric density estimation; estimating a Normal distribution (uni-variate) and other named distributions.
  • Maximizing the log-likelihood by gradient ascent and the logistic density.
  • Kernel density estimation.
  • Mixture models (ML estimation will not be on the exam).
    Evaluating models with statistics and other statistical decisions
  • Bias and variance. The bias-variance trade-off. Cross-validation.
  • Model selection for parametric models: BIC, AIC.
  • The bootstrap.
  • Confidence intervals.
  • Statistical estimators as random variables: examples from parametric estimation. Expectation, variance, asymptotic normality (consequence of the CLT) for various ML estimators.
  • Statistical decisions: using Bayes rule and conditional probability for statistical reasoning; statistical decisions with costs.
  • Hypothesis testing: concepts and simple examples; the Likelihood Ratio test
    Prediction
  • Prediction: linear regression.
  • Prediction: classification. Generative (likelihood ratio) vs. discriminative. The logistic and nearest neighbor classifier.
    Topics not covered (and not required)
  • Dimension reduction: Principal Component Analysis.
  • Clustering and the EM algorithm.
  • Streaming data. Active and passive data collection. Experiment design.
  • Models for dependent data: sequences, networks, spatial data.