STAT 391 Exam

You will be allowed to have 6 pages of notes, any number of the solutions handed out by us (okay to write notes on the solution pages) but no other books or written materials. No electronic devices of any kind are allowed.

Unlike for the quizzes, please do not start the exam until we tell you. Everyone will start the exam at the same time. Please keep the exam closed until we tell you to start.

Any fact from course notes and textbook can be used without proof. This does not apply to facts/statements in the Homework solutions.

Don't cheat Probably all of you and for sure almost all of you will try to do your best in this exam. Still, just in case somebody thinks that I tolerate cheating: I don't tolerate it. If you are found there will be consequences. And it's so unfair to everyone else.

Contents/topics

A list of topics covered can be found below. I will try to cover the course contents approximatively uniformly, but keep in mind that there will about 5 problems in total. It's likely there will be a difficult problem, it's likely there will be a ML estimation problem, and there will be a few multiple choice questions for sure.

List of topics (short version) Everything in the in-class lecture notes except for Clustering (lectures 19,20), Double Descent (parts of Lectures 17, 18). Topics not touched upon in class or homework will not be on the exam (e.g. variance of beta^ML in linear regression will not be on exam; classification by likelihood ratio, like in the language classification model, may be).

List of topics

Discrete sample spaces, repeated independent trials, probabilities of sequences vs. probabilities of counts, Binomial and Multinomial distributions
ML estimation for discrete distributions, special distributions (e.g. Poisson), sufficient statistics
ML estimation advanced cases, e.g. censored data, tied parameters. Writing likelihood P[ data | parameters ] in a new case.
Beyond ML: estimating small probabilities. Why and how (NE, WB, Laplace only)
Continuous sample spaces, parametric families of distributions of continous S.
Parametric density estimation by ML. Sufficient statistics.
Maximizing the likelihood by gradient ascent.
Kernel density estimation.
Cross-validation and using it to select kernel width.
Bias, Variance, overfitting, underfitting
Mixture Models (Mixtures of Gaussians)
Model Selection; Model selection for parametric models, AIC, BIC, counting parameters
Statistical estimators (e.g. ML) as random variables. Consistency, biaseness, variance
Bayesian estimation for parametric models (over discrete S only). Dirichlet/Beta distribution as conjugate prior.
Prediction concepts. Linear regression. Logistic regression.
Multivariate Gaussian distribution.

FAQ Some problems require to "show your work". If we don't explicitly ask to "show your work", only the final answer is required/graded.

For full credit, simplify each answer as much as possible, when it is a literal or a numeric expression. Occasionally, we may ask to just "plug in numerical values", then you don't need to simplify.

Q: Is my data { 1, 1, 1, 2, 3, 3 } a sequence or not? In other words, shall I include the combinatorial coefficient 6!/3!/2! in the likelihood or not?

A: When this coefficient does not affect the final answer, you can leave it out or include it. For example, if you were to do a ML estimation, the parameter estimate will be the same with or without the multinomial coefficient. In other situations, we will try to make clear whether the data is a sequence or a multi-set.

Q: Will the exam be like a quiz?

A: Yes and no. Yes, the organization will be the same. No, some problems will be longer, more like homework problems. In fact 40% of the problems will be similar to the ungraded homeworks 2, 4 and 6.

Q: What about the problems on Hw?

A: At least 40% of exam points will be from problems very similar to those in Hw 2, 4, 6.

Q: What are LDA, QDA?

A: QDA = Quadratic Discriminant Analysis = a name for Generative Classifier where P_X|C is normal, and each class has a different covariance matrix Sigma_c.

LDA = Linear Discriminant Analysis = a name for Generative Classifier where P_X|C is normal, and classes have the same covariance matrix Sigma.

These terms are used in some older exams, but since I did not use them this year, don't mind them. For your curiosity, you can find the definitions in the textbook.

Partial credit: we give partial credit in the measure that your efforts show correct thinking, or good use of available knowledge. We always read what you write and try to make sense of it. We give partial credit for quality not quantity.

Review sessions

Marina Meila will have office hours/review on 6/6 12-2pm (notes from this review are l14-may11-extras.pdf (extra notes are on pages 5,6,20,21) and exam-review-notes.pdf. Shreya Prakash will hold a review on 6/5 at 4pm.

Sample exams with solutions are here