STATISTICS 593:
Machine Learning, Spring, 2000
Instructors: Alejandro Murua and Jon A. Wellner

Tentative Time: T-Th 9:30 - 11:00
First Meeting: Tuesday, March 28

This special topic course will deal with statistical issues arising from ``machine learning''. The course will begin with a discussion of the concept ``learning'' and its connection to some classical statistical problems and terminology. Although emphasis will be placed on classification error bounds, and model selection techniques, the course will cover both applied and theoretical aspects of machine learning. Examples of applications will be drawn from speech and object recognition, as well as from document classification, and other fields. This course is a credit/non-credit course. Student participation will be encouraged either through special reading assignments, short presentations, or short term projects.

Tentative Topics / Outline

I.

Introduction: (3 weeks)
A. Problems: Classification, and Discrimination problems: (1.5 weeks)

Supervised Learning:
- Linear Discriminant Analysis.
- Logistic Discrimination.
- Flexible Discriminant Analysis.
- Instance-based learning: K-Nearest-Neighbors.
PAC-learnability. The special case of a finite family of classifiers.

B. Methods: Vapnik-Chervonenkis classes and empirical process theory: (1.5 weeks)

VC-classes , exponential bounds, and PAC-learnability.
Glivenko-Cantelli classes of functions.
``universal'' Glivenko-Cantelli classes of functions.
An introduction to ``uniform'' exponential bounds.

II.

Classification and Decision Trees: (2 weeks)
A. Classification and decision trees: (1 week)

Bagging.
Boosting.
weakly-dependent classifiers.

B. Exponential bounds, Part II: (1 week)

Isoperimetric inequalities (Talagrand).
Massart's inequalities.

III.

Model Complexity: Penalization and Model Selection: (3.5 weeks)
A. Approaches to model selection: (1 week)

Prunning decision trees.
AIC, BIC.
Vapnik's risk minimization.

B. Empirical Penalization and Exponential bounds, Part III: (2.5 weeks)

Koltchinskii's approach: Rademacher penalties.
Birgé and Massart: model selection and ``Oracle inequalities''.

Some Reference Books:

Devroye, L., Gyorfi, and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition.Springer, New York.
Mitchell, T. (1997). Machine Learning. McGraw-Hill, New York.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press.
Vapnik, V. (1998). Statistical Learning Theory. Wiley, New York.
Vidyasagar, M. (1997). A Theory of Learning and Generalization. Springer, New York.