Machine Learning for Big Data
CSE 547/STAT 548 Spring Quarter 2025

Home

Syllabus

Books and other resources

Class mailing list

 

Assignments

Handouts/Course notes

UW Statistics

UW CSE

Syllabus (tentative)

  • Introduction: learning Big Data -- what is different?
  • Basics of parallel programming for big data
  • Nearest neighbors in high dimensions Locality Sensitive Hashig (LSH)
  • Dimension reduction and manifold learning
  • Clustering
    • -- parametric clustering (K-means, EM)
    • -- non-parametric clustering
      • - Bayesian non-parametric Dirichlet Process Mixture Models
      • - mode-seeking methods (mean-shift)
      • - level set methods
  • Graph and network data anaylysis
    • - PageRank and Personalized Page Rank
    • - spectral clustering
    • - community detection
    • - network models
  • Information retrieval
  • Data streams
    • - Thomson sampling, reservoir sampling