Adrian Raftery | Statistics Department | CSSS | University of Washington

STAT/CS&SS 504: Applied Regression Analysis

Winter Quarter 2011

Instructor: Adrian Raftery, Department of Statistics. My office is room C-313, Padelford Hall. My phone number is 206-543-4505, which I will answer during office hours. My email address is raftery AT uw DOT edu; email is the best way to contact me outside office hours.

Grader: Steven Danna, Evans School of Public Affairs, email sdanna7 at uw dot edu, will be the grader for the course. He will grade homeworks, and will also hold office hours and answer questions by email.

Office hours: I will hold office hours on Thursdays from 4:00-5:30pm in Padelford C-313. Please do not hesitate to come and see me if you have a problem or if you just want to discuss issues arising in the class.

I will also hold "electronic office hours" by responding to email questions, with a target response time of one working day. If it seems appropriate to me, and if you don't ask me not to, I will send the response to the class mailing list (see below), after removing your name and identifying information.

Steven Danna will hold office hours on Fridays from 4:00-5:00pm in Padelford B-302.

Here is a summary of the general course schedule and office hours.

STAT/CS&SS 504: Course Schedule and Office Hours
Time Monday Tuesday Wednesday Thursday Friday
12:30-1:20 Quiz section
2:30-3:20 Class: Homework due Class Class
4:00-5:30 Adrian office hours PDL C-313 Steven office hours PDL B-502

Prerequisites: One of the following:

Relation to other courses: There are several applied regression courses on campus; this one is probably the most technically rigorous. This course covers material similar to STAT 423, but at a more advanced level. This is a graduate course; if you are an undergraduate you should probably be taking STAT 423 instead. This course covers material similar to SOC 506/CS&SS 507, but with a more technical emphasis.

Registration: Please register for the course for credit; auditing is not allowed. If you are not a registered student but are a UW employee, you may be eligible to take this class tuition-free via the UW Tuition Exemption Benefit. In any event, all students must register. See the registration instructions for students, UW employees and non-UW individuals.

Requirements: Your course grade will be based on homework assignments (35%), quiz section participation (5%), a mid-term exam (25%), a group project (30%), and participation in the project presentation sessions (5%). If not enough project topics are proposed, there will be an in-class final exam instead of the group project. If you provide the dataset for one of the group projects, 0.1 will be added to your final grade.

Homework will be assigned most weeks, and will be due in class on the Monday of the following week at 2:30pm, in printed out (paper) form. If there is homework assigned for the two weeks with Monday holidays, it will be due by 10:00am on Tuesday in the STAT 504 mailbox in Padelford B-313D. The schedule is designed so that homework can be corrected and returned to you quickly, usually at the Thursday quiz section, where it will be discussed. To enable us to do this we will not be able to accept late homework. Many of the homeworks will involve computing.

Computing: Most of the homework assignments will involve computing. The preferred software for the class is R, and you may use this on any platform that you wish, including your own PC (it runs under Windows, Mac OS X and Linux). R can be downloaded for free at CRAN where good introductory documentation is also available.

Class mailing list: There will be a class mailing list. Your address is automatically part of the mailing list, and you may post to it from your email address. Please feel free to post to the class mailing list.

Course description: Introduction to simple and multiple linear regression. Estimation, problems in interpreting regression coefficients, weighted least squares, categorical independent variables, interaction. Analysis of variance.

Regression analysis aims to explain or predict a quantitative outcome or dependent variable in terms of independent or predictor variables. Example questions include:

One of the achievements of the discipline of statistics has been to work out a theory of linear regression, where the outcome can be predicted by a linear combination of the independent variables and the errors are independent. This is a beautiful theory, which was largely complete by 1940, and is very widely applied: it has been estimated that more than one million regression models are estimated in the world every day. In roughly the first half of the course we will describe this theory, lay out the underlying model, and show how to estimate its parameters and test hypotheses relating to them. We will also show that it is extremely flexible, allowing one to model categorical independent variables, interactions, and certain forms of nonlinearity.

As always, though, there's a catch; nothing comes for free. In regression, the catch is that the theory is based on a certain number of assumptions: normality of errors (essentially, there are no outliers), linearity, equal variance of errors, independence of errors, and availability of data on all the variables for each case or data point. These assumptions often don't hold, unfortunately, in which case conclusions from "running" a regression can be severely flawed. We'll discuss ways of detecting these problems, assessing their implications for inference, and remedying them. Other tricky issues that arise, without necessarily being model violations, include influential points and high correlations between independent variables, and we will discuss these too. We will also discuss the choice of independent variables, and model selection and model building in practice.

Much of the second half of the course will be devoted to situations where the standard regression model may not hold. There will be a focus on more recent methods developed over the past 30 years or so, which are often computationally intensive. The emphasis will be on applications and interpretation, rather than equations and mathematical derivations. We will illustrate the ideas with the analysis of data from recent or current research.

Weisberg, S. (2005). Applied Linear Regression, 3rd edition, Wiley.
There are several other readings that will be available on the Web, on which I will be drawing in the second half of the course.

Course outline: Note: Wa refers to chapter a of the Weisberg text, and Wa.b refers to section a.b of the Weisberg text. The other references are to articles that will be available as supplementary readings.

  1. Linear regression
    1. Basic concepts
    2. Linear regression with one independent variable: W2
    3. Multiple linear regression : W3.1-3.3
    4. Estimation: W3.4, W4.1
    5. Testing: W3.5, W5.2-5.4
    6. Weighted least squares: W5.1
    7. Categorical independent variables: W6
    8. Interactions

  2. Problems, consequences, symptoms and remedies
    1. Variable selection and model uncertainty: W10; Raftery (1995); Raftery, Painter and Volinsky (2005)
    2. Outliers: Residuals, diagnostics and robust regression: W1, W8.1, W9.1
    3. Influential observations: W9.2
    4. Nonlinearity: Transformations, smoothing and ACE: W7; Deveaux (1989)
    5. Nonconstant variance: Transformations and weighting: W8.3
    6. Missing data: W4.5; Little and Rubin (1989); Little (1992)
    7. Binary and binomial outcomes: Logistic regression: W12

Last updated May 26, 2011