Preface


Traditionally, applied probability texts contain a fair amount of probability theory, varying amounts of applications, and no data. Occasionally an author may touch upon how one would go about fitting a model to data, or use data to develop a model, but rarely is this topic given much weight. On the other hand, the few texts on inference for stochastic processes mostly dwell at length upon the interesting twists that occur in statistical theory when data no longer can be assumed iid. But, again, they rarely contain any data. The intent of this text is to present some probability models, some statistics relevant to these models, and some data that illustrate some of the points made.

My experience as a practicing statistician, specializing in models for dependent data, has been that no real progress can be made without spending substantial time trying to master enough of the subject matter to be able to talk to area scientists in their language, rather than in mathematical language. Consequently, I have tried to give scientific background to many of the applications I present in more detail than is usual in statistics texts. A general scientific background, but no special training, should suffice to enable you to follow the gist of the explanations. From a mathematical point of view you need elementary probability, including conditional distributions and expectations, calculus, ordinary differential equations, and linear algebra. Having encountered statistics would be very useful, although it is not strictly necessary. My students tell me that a first course in stochastic processes is a very useful background, but I have tried to include sufficient material to make this unnecessary. I avoid measure-theoretical arguments, but have to use some L2 -theory to introduce (albeit in an extremely low-key manner) stochastic integration.

As the chapters progress, there are fewer formal proofs and more references to the literature for verification of results. This is because while it is possible to do much discrete time Markov chain theory using elementary probability, it becomes progressively harder to do the proofs as the processes become more complicated. I therefore often resort to special cases and intuitive explanations in many instances.

Picking topics is necessarily a very subjective matter. I have omitted some topics that others may find essential. For example, while renewal theory is a beautiful piece of probability, I have not found many interesting scientific applications. Martingales are not included, in spite of their usefulness in statistical theory. I ignore stationary time series, since in this area there are plenty of books containing both theory and data analysis. On the other hand I try to emphasize the importance of Markov chain Monte Carlo methods, which are having as profound an impact on statistics in the nineties as did the bootstrap in the eighties. I have found the state space modeling approach useful in a variety of situations, as I am sure you can tell.

At first I was intending to write a short introduction, indicating why I think stochastic models are useful. It grew into a chapter of its own. Three appendices contain some material that not everyone may have encountered before. I hope there is enough so that, for example, someone without much mathematical statistics will get an idea of the main tools (that will be used in the text).

The material in this book is suitable for a two-semester course. I have tried to cover most of it in a two-quarter sequence, but that gets a bit rushed. When I teach this course, one of the main teaching tools is laboratory assignments, where the students work on analyzing data sets, simulating processes and statistical procedures, and work through some theory as well. I have included versions of these laboratory assignments in the exercises that follow each chapter. These exercises contain extensions and details of the probabilistic exposition, computational exercises, and data analysis problems. In conjunction with the book, several data sets will be available via anonymous ftp (details are given after the indexes).

There is a variety of examples and applications. The former are attempts to illustrate some concepts while the latter apply the concepts to data. In order to facilitate reference to the book in lectures I have numbered all displayed equations. Theorems, propositions, lemmata, figures, and tables are separately numbered. To simplify life a little for the readers there are, in addition to the regular indexes of terms and notation, indexes of examples/applications and of numbered theorems, propositions, and lemmata. Definitions are indicated with bold typeface (and referred to in the Index of terms). As is common in applied probability (but not in statistics), all vectors are row vectors.

The applications in the text draw on many different sources. Some originate in work by me or my students, others come from colleagues and friends who I have talked into giving me their data, and yet others are quoted from the literature. At the end of each chapter I try to acknowledge my main sources, as well as give occasional hints as to where one can learn more (preferably discussion papers). I apologize for any inadvertent failures to make such acknowledgements.

A large number of students and colleagues have had comments and suggestions regarding the text. Julian Besag, Michael Phelan, and Elizabeth Thompson have been particularly helpful, as well as a host of students over the years. Four reviewers made numerous helpful comments: Simon Tavare, Michael P. Bailey, David D. Yao and Laurence Baxter. John Kimmel was a patient and helpful editor throughout the project, while Achi Dosanjh and Emma Broomby skillfully guided me through the final stages of manuscript preparation. Several colleagues have generously provided data, figures or other help. I would particularly like to thank Joao Batista, David Brillinger, Anna Guttorp, David Higdon, Jim Hughes, Stephen Kaluzny, Niels Keiding, Brian Leroux, Hongzhe Li, Iain MacDonald, Roland Madden, Thomas Murray, Michael Newton, Haiganoush Preisler, John Rice, Brian Ripley, Nuala Sheehan, Jim Smith, and Elizabeth Thompson.

The text was produced using Eroff from Elan Computer Group, Inc. The statistical analysis was generally done in Splus (StatSci division of Mathsoft, Inc.) and most figures benefited greatly from the psfig utility written by Antonio Possolo and the setps wrapper written by Frank Harrell. The spatial point process analysis employed the Splancs library produced by B. S. Rowlingson and Peter Diggle at Lancaster University. Software for some specialized analyses was provided by Charlie Geyer and John Rice.

This work had partial support from the National Science Foundation (DMS-9115756), the National Institute of Health (HL 31823), the Environmental Protection Agency, and the Electrical Power Research Institute. This text cannot in any way be construed as official policy of any of these organizations. The final version was produced during a visiting scholar appointment at the Institute for Environmental Studies at the University of Washington, and I am grateful for their generous support during a most difficult time.

My family has patiently suffered through my long preoccupation with this manuscript. June, Eric, and Kenji deserve much more attention (and will, hopefully, get it) from me. They have been very supportive of the effort.

I was fortunate to have Jerzy Neyman and David Brillinger as advisers when I was a graduate student. From them I learned a way of thinking about science, about data, and about modeling. And I learned, in Mr. Neyman's words, that ``Life is complicated, but not uninteresting.''