Syllabus (tentative)
- Introduction: learning Big Data -- what is different?
- Basics of parallel programming for big data
- Nearest neighbors in high dimensions Locality Sensitive Hashig (LSH)
- Dimension reduction and manifold learning
- Clustering
- -- parametric clustering (K-means, EM)
- -- non-parametric clustering
- - Bayesian non-parametric Dirichlet Process Mixture Models
- - mode-seeking methods (mean-shift)
- - level set methods
- Graph and network data anaylysis
- - PageRank and Personalized Page Rank
- - spectral clustering
- - community detection
- - network models
- Information retrieval
- Data streams
- - Thomson sampling, reservoir sampling
|