**Instructor:**
Adrian Raftery, Department of Statistics.
My office is room C-313, Padelford Hall.
My phone number is 206-543-4505, which I will answer during office hours.
My email address is raftery AT uw DOT edu; email is the best way
to contact me outside office hours.

**Grader:** Steven Danna, Evans School of Public Affairs,
email sdanna7 at uw dot edu, will be the
grader for the course. He will grade homeworks, and will also hold office
hours and answer questions by email.

**Office hours:** I will hold office hours on
Thursdays from 4:00-5:30pm in Padelford C-313.
Please do not hesitate to come and see me if you have a problem or if you
just want to discuss issues arising in the class.

I will also hold "electronic office hours" by responding to email questions, with a target response time of one working day. If it seems appropriate to me, and if you don't ask me not to, I will send the response to the class mailing list (see below), after removing your name and identifying information.

Steven Danna will hold office hours on Fridays from 4:00-5:00pm in Padelford B-302.

Here is a summary of the general course schedule and office hours.

Time | Monday | Tuesday | Wednesday | Thursday | Friday |
---|---|---|---|---|---|

12:30-1:20 | Quiz section | ||||

1:30-2:20 | |||||

2:30-3:20 | Class: Homework due | Class | Class | ||

3:30-4:00 | |||||

4:00-5:30 | Adrian office hours PDL C-313 | Steven office hours PDL B-502 |

**Prerequisites:** One of the following:

- STAT 502
- STAT 421
- STAT 342
- STAT 390
- a grade of at least 3.0 in STAT 311, plus MATH 126
- SOC 505
- permission of instructor.

**Relation to other courses:**
There are several applied regression courses on campus; this one
is probably the most technically rigorous.
This course covers material similar to STAT 423, but at a more
advanced level. This is a graduate course; if you are an undergraduate
you should probably be taking STAT 423 instead.
This course covers material similar to SOC 506/CS&SS 507,
but with a more technical emphasis.

**Registration:**
Please register for the course for credit; auditing is not allowed.
If you are not a registered student but are a UW
employee, you may be eligible to take this class tuition-free via
the UW Tuition Exemption Benefit. In any event, all students
must register. See the
registration instructions for students, UW employees and non-UW
individuals.

**Requirements:**
Your course grade will be based on homework assignments (35%),
quiz section participation (5%),
a mid-term exam (25%),
a group project (30%), and participation in the project presentation
sessions (5%). If not enough project topics are proposed,
there will be an in-class final exam instead of the group project.
If you provide the dataset for one of the group projects,
0.1 will be added to your final grade.

Homework will be assigned most weeks, and will be due in class on the Monday of the following week at 2:30pm, in printed out (paper) form. If there is homework assigned for the two weeks with Monday holidays, it will be due by 10:00am on Tuesday in the STAT 504 mailbox in Padelford B-313D. The schedule is designed so that homework can be corrected and returned to you quickly, usually at the Thursday quiz section, where it will be discussed. To enable us to do this we will not be able to accept late homework. Many of the homeworks will involve computing.

**Computing:**
Most of the homework assignments will involve computing.
The preferred software for the class is R, and you may use this
on any platform that you wish, including your own PC (it runs under
Windows, Mac OS X and Linux). R can be downloaded for free at
CRAN where
good introductory documentation is also available.

**Class mailing list:**
There will be a class mailing list. Your uw.edu address is
automatically part of the mailing list, and you may post to it
from your uw.edu email address. Please feel free to post
to the class mailing list.

**Course description:**
Introduction to simple and multiple linear regression. Estimation, problems in
interpreting regression coefficients, weighted least squares, categorical
independent variables, interaction. Analysis of variance.

Regression analysis aims to explain or predict a quantitative outcome or dependent variable in terms of independent or predictor variables. Example questions include:

- What will the temperature be at the University of Washington in 48 hours, given current conditions and National Weather Service forecasts?
- To what extent can we explain people's occupational and professional success in terms of their family background and education?
- What are the factors influencing differences in birth rates between different regions?

One of the achievements of the discipline of statistics has been to work out a theory of linear regression, where the outcome can be predicted by a linear combination of the independent variables and the errors are independent. This is a beautiful theory, which was largely complete by 1940, and is very widely applied: it has been estimated that more than one million regression models are estimated in the world every day. In roughly the first half of the course we will describe this theory, lay out the underlying model, and show how to estimate its parameters and test hypotheses relating to them. We will also show that it is extremely flexible, allowing one to model categorical independent variables, interactions, and certain forms of nonlinearity.

As always, though, there's a catch; nothing comes for free. In regression, the catch is that the theory is based on a certain number of assumptions: normality of errors (essentially, there are no outliers), linearity, equal variance of errors, independence of errors, and availability of data on all the variables for each case or data point. These assumptions often don't hold, unfortunately, in which case conclusions from "running" a regression can be severely flawed. We'll discuss ways of detecting these problems, assessing their implications for inference, and remedying them. Other tricky issues that arise, without necessarily being model violations, include influential points and high correlations between independent variables, and we will discuss these too. We will also discuss the choice of independent variables, and model selection and model building in practice.

Much of the second half of the course will be devoted to situations where the standard regression model may not hold. There will be a focus on more recent methods developed over the past 30 years or so, which are often computationally intensive. The emphasis will be on applications and interpretation, rather than equations and mathematical derivations. We will illustrate the ideas with the analysis of data from recent or current research.

**Text:**

Weisberg, S. (2005). *Applied Linear Regression*,
3rd edition, Wiley.

There are several other readings that will be
available on the Web, on which I will be drawing in the second
half of the course.

**Course outline:**
*Note:* Wa refers to chapter a of the Weisberg text, and
Wa.b refers to section a.b of the Weisberg text.
The other references are to articles that will be
available as supplementary readings.

- Linear regression
- Basic concepts
- Linear regression with one independent variable: W2
- Multiple linear regression : W3.1-3.3
- Estimation: W3.4, W4.1
- Testing: W3.5, W5.2-5.4
- Weighted least squares: W5.1
- Categorical independent variables: W6
- Interactions

- Problems, consequences, symptoms and remedies
- Variable selection and model uncertainty: W10; Raftery (1995); Raftery, Painter and Volinsky (2005)
- Outliers: Residuals, diagnostics and robust regression: W1, W8.1, W9.1
- Influential observations: W9.2
- Nonlinearity: Transformations, smoothing and ACE: W7; Deveaux (1989)
- Nonconstant variance: Transformations and weighting: W8.3
- Missing data: W4.5; Little and Rubin (1989); Little (1992)
- Binary and binomial outcomes: Logistic regression: W12

Last updated May 26, 2011