Statistics for Data Science
STAT 391 Spring Quarter 2020

Home

Course Description

Syllabus

Books

Class mailing list

Anonymous feedback

 

Assignments

Handouts

Other resources

UW Statistics

UW ACMS Program

UW CSE

Syllabus


  • Examples of applications. Probability vs. statistics.
    Estimating a distribution
  • Models, likelihood and Max Likelihood estimation. Estimation of discrete distributions. Sufficient Statistics.
  • Estimation of small probabilities.
  • Parametric density estimation; estimating a Normal distribution (uni-variate) and other named distributions.
  • Maximizing the log-likelihood by gradient ascent and the logistic density.
  • Kernel density estimation.
  • Mixture models (ML estimation will not be on the exam).
    Evaluating models with statistics and other statistical decisions
  • Bias and variance. The bias-variance trade-off. Cross-validation.
  • Model selection for parametric models: BIC, AIC.
  • The bootstrap.
  • Confidence intervals.
  • Statistical estimators as random variables: examples from parametric estimation. Expectation, variance, asymptotic normality (consequence of the CLT) for various ML estimators.
  • Statistical decisions: using Bayes rule and conditional probability for statistical reasoning; statistical decisions with costs.
  • Hypothesis testing: concepts and simple examples; the Likelihood Ratio test
    Prediction
  • Prediction: linear regression.
  • Prediction: classification. Generative (likelihood ratio) vs. discriminative. The logistic and nearest neighbor classifier.
    Topics not covered (and not required)
  • Dimension reduction: Principal Component Analysis.
  • Clustering and the EM algorithm.
  • Streaming data. Active and passive data collection. Experiment design.
  • Models for dependent data: sequences, networks, spatial data.