Most clustering methods have been developed for the situation where the
groups to be identified are well separated "blobs" in p-space.
I have been interested in the case where the groups are defined by their
*shape*, may be clustered around lines or even thin nonlinear curves, and
may even intersect. Examples are groups of boundary pixels in images,
groups of earthquakes clustered along seismic faults, and stars grouped
in galaxies.

For a review of model-based clustering, see our 2019 book, Model-Based Clustering and Classification for Data Science, with Applications in R, as well as Fraley and Raftery (2002). Free software to carry it out, MCLUST, is available for R.

More recent research projects in this area include model-based clustering for social networks, variable selection for model-based clustering, merging Gaussian mixture components to represent non-Gaussian clusters, and Bayesian model averaging for model-based clustering.

Young, W.C., Raftery, A.E. and Yeung, K.Y. (2017).
Model-based clustering with data correction for removing artifacts
in gene expression data..
*Annals of Applied Statistics* 11:1998--2026. (Open access).

Scrucca, L., Fop, M., Murphy, T.B. and Raftery, A.E. (2016).
mclust 5: Clustering, classification and density estimation using
Gaussian finite mixture models.
*R Journal* 8:289-317.

Russell, N., Murphy, T.B. and Raftery, A.E. (2015). Bayesian model averaging in model-based clustering and density estimation. Technical Report no. 635, Department of Statistics, University of Washington. Also arXiv:1506.09035.

Scrucca, L. and Raftery, A.E. (2015).
Improved initialisation of model-based clustering using a Gaussian
hierarchical partition.
*Advances in Data Analysis and Classification* 9:447-460.

Scrucca, L. and Raftery, A.E. (2014). clustvarsel: A Package Implementing Variable Selection for Model-based Clustering in R. Technical Report no. 629, Department of Statistics, University of Washington. Also arXiv:1411.0606.

Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. and
Raftery, A.E. (2014).
Comparing Model Selection and Regularization
Approaches to Variable Selection in Model-Based Clustering.
*Journal de la Société Française de Statistique*,
155(2):57-71.

Raftery, A.E., Niu, X., Hoff, P.D. and Yeung, K.Y. (2012).
Fast Inference for the
Latent Space Network Model Using a Case-Control Approximate Likelihood.
*Journal of Computational and Graphical Statistics*, 21:909-919.

Baudry, J.-P., Raftery, A.E., Celeux, G., Lo, K. and Gottardo, R. (2010).
Combining Mixture Components for Clustering.
*Journal of Computational and Graphical Statistics* 19:332-353.

Murphy, T.B., Dean. N. and Raftery, A.E. (2010).
Variable Selection and Updating In Model-Based Discriminant Analysis
for High Dimensional Data with Food Authenticity Applications.
*Annals of Applied Statistics* 4:396-421.

Dean, N. and Raftery, A.E. (2010).
Latent Class Analysis Variable Selection.
*Annals of the Institute of Statistical Mathematics* 62:11-35.

Steele, R.J., Wang, N. and Raftery, A.E. (2010).
Inference from multiple imputation for missing data
using mixtures of normals.
*Statistical Methodology* 7:351-365.

Steele, R.J. and Raftery, A.E. (2010). Performance of Bayesian Model Selection Criteria for Gaussian Mixture Models. In Frontiers of Statistical Decision Making and Bayesian Analysis (edited by M.-H. Chen et al), pages 113-130, New York: Springer. Earlier version.

Krivitsky, P., Handcock, M.S., Raftery, A.E. and Hoff, P. (2009).
Representing Degree Distributions, Clustering, and Homophily in
Social Networks With Latent Cluster Random Ects Models.
*Social Networks* 31:204-213.

Handcock, M.S., Raftery, A.E. and Tantrum, J. (2007).
Model-based clustering
for social networks (with Discussion).
*Journal of the Royal Statistical Society, Series A*,
170, 301-354.

Oh, M.-S. and Raftery, A.E. (2007).
Model-based Clustering with Dissimilarities: A Bayesian Approach.
*Journal of Computational and Graphical Statistics*, 16, 559-585.

Fraley, C. and Raftery, A.E. (2007).
Bayesian Regularization for Normal
Mixture Estimation and Model-Based Clustering.
*Journal of Classification*, 24, 155-181.

Fraley C. and Raftery A.E. (2007).
Model-based
methods of classification: Using the mclust software in chemometrics.
*Journal of Statistical Software*, 18, paper i06.

Fraley, C. and Raftery, A.E. (2006). MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering. Technical Report no. 504, Department of Statistics, University of Washington.

Raftery, A.E. and Dean, N. (2006).
Variable Selection for Model-Based Clustering.
*Journal of the American Statistical Assocation*, 101, 168-178.

Steele, R., Raftery, A.E. and Emond, M. (2006).
Computing Normalizing Constants for Finite Mixture Models
via Incremental Mixture Importance Sampling (IMIS).
*Journal of Computational and Graphical Statistics*, 15, 712-734.

Forbes, F., Peyrard, N., Fraley, C., Georgian-Smith, D.,
Goldhaber, D.M., and Raftery, A.E. (2006).
Model-Based Region-of-Interest
Selection in Dynamic Breast MRI.
*Journal of Computer Assisted Tomography*, 30, 675-687.

Fraley, C. and Raftery, A.E. (2006).
Some applications of model-based clustering in chemistry.
*R News*, 6, no. 3, 17-23.

Fraley, C. and Raftery, A.E. (2006).
Model-based microarray image analysis.
*R News*, 6, no. 5, 60-63.

Fraley, C., Raftery, A.E. and Wehrens, R. (2005).
Incremental Model-Based Clustering for Large Datasets with Small Clusters.
*Journal of Computational and Graphical Statistics*, 14, 529-546.

Murtagh, F., Raftery, A.E., and J.L. Starck (2005).
Bayesian inference for multiband image segmentation via model-based cluster
trees.
*Image and Vision Computing*, 23, 587-596.

Dean, N. and Raftery, A.E. (2005).
``Normal uniform mixture differential gene expression detection for
cDNA microarrays.''
*BMC Bioinformatics*, 6, 173. (doi:10.1186/1471-2105-6-173).

Li, Q., Fraley, C., Bumgarner, R.E., Yeung, K.Y. and Raftery, A.E. (2005).
``Donuts, Scratches and Blanks: Robust Model-Based Segmentation
of Microarray Images.''
*Bioinformatics*, 21(12), 2875-2882
(doi:10.1093/bioinformatics/bti447).

Walsh, D.C.I. and Raftery, A.E. (2005).
Classification of mixtures of spatial point processes
via partial Bayes factors.
*Journal of Computational and Graphical Statistics*, 14, 139-154.

Wehrens, R., Buydens, L.M.C., Fraley, C. and Raftery, A.E. (2004).
Model-Based Clustering for Image Segmentation and Large Datasets Via Sampling.
*Journal of Classification*, 21, 231-253.

Fraley, C. and Raftery, A.E. (2002).
Model-Based Clustering,
Discriminant Analysis, and Density Estimation.
*Journal of the American Statistical Association*, 97, 611-631.

Chris Fraley and Adrian E. Raftery (2002). "MCLUST: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis" Technical Report no. 415, Department of Statistics, University of Washington.

Fionn Murtagh, Adrian E. Raftery and Jean-Luc Starck (2001). "Bayesian Inference for Color Quantization via Model-Based Clustering Trees". Technical Report no. 402, Department of Statistics, University of Washington.

Yeung K.Y., Fraley C., Murua A, Raftery, A.E. and Ruzzo, W.L. (2001).
Model-based clustering and data transformations for gene expression data.
*Bioinformatics*, 17, 977-987.

This paper was identified by ISI Science Citation Index/Web of Science as one of the most
highly-cited papers in Gene Expression Data. Here is a
commentary
on the paper by lead author Ka Yee Yeung, published by ISI in its publication
*Fast Moving Fronts*.

Stanford, D.C. and Raftery, A.E. (2000).
Principal curve clustering with noise.
*IEEE Transactions on Pattern Analysis and Machine Analysis*, 22, 601-609.

Fraley, C. and Raftery, A.E. (1999).
MCLUST: Software for Model-Based Cluster Analysis.
*Journal of Classification*, 16, 297-306.

Campbell, J.G., Fraley, C., Stanford, D., Murtagh, F. and Raftery, A.E. (1999).
Model-based methods for textile fault detection.
*International Journal of Imaging Science and Technology*, 10, 339-346.

Mukherjee, S., Feigelson, E.D., Babu, G.J., Murtagh, F., Fraley, C. and
Raftery, A.E. (1998).
Three types of gamma ray bursts.
*Astrophysical Journal*, 508, 314-327.

Fraley, C. and Raftery, A.E. (1998).
How many clusters? Which clustering
methods? Answers via model-based cluster analysis.
*Computer Journal*, 41, 578-588.

Dasgupta, A. and Raftery, A.E. (1998).
Detecting features in spatial point processes with clutter via model-based
clustering.
*Journal of the American Statistical Association*, 93, 294-302.

Campbell, J.G., Fraley, C., Murtagh, F. and Raftery, A.E. (1997).
Linear flaw detection in woven textiles using model-based clustering.
*Pattern Recognition Letters*, 18, 1539-1548.

Bensmail, H., Celeux, G., Raftery, A.E. and Robert, C. (1997).
Inference in model-based cluster analysis.
*Statistics and Computing, 7, *1-10.

Banfield, J.D. and Raftery, A.E. (1993).
Model-based Gaussian and non-Gaussian clustering.
*Biometrics, 49, *803-821.

Banfield, J.D. and Raftery, A.E. (1992).
Ice floe identification in
satellite images using mathematical morphology and clustering about
principal curves. *Journal of the American Statistical Association,
8*, 7-16.

Murtagh, F. and Raftery, A.E. (1984).
Fitting straight lines to point patterns.
*Pattern Recognition, 17, *479-483.

These papers are being made available here to facilitate the timely dissemination of scholarly work; copyright and all related rights are retained by the copyright holders.

Updated April 13, 2021.

Copyright 2005-2021 by Adrian E. Raftery; all rights reserved.