Model-Based Clustering and Spatial Point Pattern Research: Adrian Raftery

Cluster analysis has developed mainly as a set of ad hoc methods. More recently it has been found that basing cluster analysis on a probability model can be useful both for understanding when existing methods are likely to be successful, and for suggesting new and better methods. Most clustering methods have been developed for the situation where the groups to be identified are well separated "blobs" in p-space. I have been interested in the case where the groups are defined by their shape, may be clustered around lines or even thin nonlinear curves, and may even intersect. Examples are groups of boundary pixels in images, groups of earthquakes clustered along seismic faults, and stars grouped in galaxies.

Recent exciting developments include the growing ability to use standard statistical model comparison tools (Bayes factors) to choose both the number of clusters and the clustering method (Fraley and Raftery 1998), as well as to allow automatically for outliers and clutter. Chris Fraley's recent EMCLUST software, currently at the test stage, will provide capabilities to do this.

I am also interested in extending cluster analysis to deal with the identification of features in spatial point patterns, and of boundaries in images. By recasting cluster analysis in terms of a mixture model and appropriately constaining the distribution of points within each group, progress can be made. Very useful methods result by assuming that groups are highly concentrated about a line, or that they are highly concentrated about a nonparametric curve, modeled by a principal curve (Hastie and Stuetzle, JASA, 1989).

Other useful methods for finding features in spatial point patterns involve nonparametric maximum likelihood using Voronoi polygons. nearest neighbor denoising, and explicit modeling backed up by Markov chain Monte Carlo reversible jump estimation.

Key References:

Jeffrey D. Banfield and Adrian E. Raftery (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803-821.

Chris Fraley and Adrian E. Raftery (1998). How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis. Technical Report no. 329, Department of Statistics, University of Washington.


Click here to return to Adrian Raftery's home page.

Click here to return to the Statistics Dept. home page.

Click here to return to the University of Washington home page.