CSE 547/STAT 548

Machine Learning for Big Data
CSE 547/STAT 548 Spring Quarter 2025

Syllabus (tentative)

Introduction: learning Big Data -- what is different?
Basics of parallel programming for big data
Nearest neighbors in high dimensions Locality Sensitive Hashig (LSH)
Dimension reduction and manifold learning
Clustering
- -- parametric clustering (K-means, EM)
- -- non-parametric clustering
  - - Bayesian non-parametric Dirichlet Process Mixture Models
  - - mode-seeking methods (mean-shift)
  - - level set methods
Graph and network data anaylysis
- - PageRank and Personalized Page Rank
- - spectral clustering
- - community detection
- - network models
Information retrieval
Data streams
- - Thomson sampling, reservoir sampling