Abstract
This article evaluates a new Bayesian approach to determining the number of components in a finite mixture. We evaluate through simulation studies mixtures of normals and latent class mixtures of Bernoulli responses. For normal mixtures we use a “gold standard” set of population models based on a well-known “testbed” data set—the galaxy recession velocity data set of Roeder (J Am Stat Assoc 85:617–624, 1990). For Bernoulli latent class mixtures we consider models for psychiatric diagnosis Berkhof et al. (Stat Sin 13:423–442, 2003). The new approach is based on comparing models with different numbers of components through their posterior deviance distributions, based on non-informative or diffuse priors. Simulations show that even large numbers of closely spaced normal components can be identified with sufficiently large samples, while for latent classes with Bernoulli responses identification is more complex, though it again improves with increasing sample size.
Similar content being viewed by others
Notes
Full details can be found in their paper.
This may seem counter-intuitive, since the ML estimate of the saturated model always has the smallest frequentist deviance. The single observation in each “class” however gives a very diffuse likelihood for each \(p_{ij}\) and this leads to a very diffuse and large deviance distribution.
References
Aitkin, M.: The calibration of p-values, posterior Bayes factors and the AIC from the posterior distribution of the likelihood (with discussion). Stat. Comput. 7, 253–272 (1997)
Aitkin, M.: Likelihood and Bayesian analysis of mixtures. Stat. Model. 1, 287–304 (2001)
Aitkin, M.: Statistical Inference: an Integrated Bayesian/Likelihood Approach. Chapman and Hall/CRC Press, Boca Raton (2010)
Aitkin, M.: How many components in a finite mixture? In: Mengersen, K.L., Robert, C.P., Titterington, D.M. (eds.) Mixtures Estimation and Applications. Wiley, Chichester (2011)
Bartlett, M.S.: A comment on D. V. Lindley’s statistical paradox. Biometrika 44, 533–534 (1957)
Berkhof, J., van Mechelen, I., Gelman, A.: A Bayesian approach to the selection and testing of mixture models. Stat. Sin. 13, 423–442 (2003)
Celeux, G., Forbes, F., Robert, C.P., Titterington, D.M.: Deviance information criteria for missing data models. Bayesian Anal. 1, 651–674 (2006)
Dempster, A.P.: The direct use of likelihood in significance testing. Stat. Comput. 7, 247–252 (1997)
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90, 577–588 (1995)
Garcia-Escudero, L.A., Gordaliza, A., Matran, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 01/2015; 2015. doi:10.1007/s11222-014-9455-3
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Lindley, D.V.: A statistical paradox. Biometrika 44, 187–192 (1957)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Nylund, K.L., Asparouhov, T., Muthen, B.O.: Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct. Equ. Model. 14, 535–569 (2007)
Phillips, D.B., Smith, A.F.M.: Bayesian model comparison via jump diffusions. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC Press, Boca Raton (1996)
Postman, M., Huchra, J.P., Geller, M.J.: Probes of large-scale structures in the Corona Borealis region. Astron. J. 92, 1238–1247 (1986)
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. B 59, 731–792 (1997)
Roeder, K.: Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Am. Stat. Assoc. 85, 617–624 (1990)
Roeder, K., Wasserman, L.: Practical Bayesian density estimation using mixtures of normals. J. Am. Stat. Assoc. 92, 894–902 (1997)
Spiegelhalter, D.J., Best, N., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 64, 583–639 (2002)
Stephens, M.: Bayesian analysis of mixtures with an unknown number of components–an alternative to reversible jump methods. Ann. Stat. 28, 40–74 (2000)
Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–550 (1987)
van Mechelen, I., De Boeck, P.: Implicit taxonomy in psychiatric diagnosis: a case study. J. Soc. Clin. Psychol. 8, 276–287 (1989)
Acknowledgments
We are grateful for research support from the Australian Research Council under project DP120102902 for the support of Duy Vu for the period of this research (2012–2015), and for visits by Brian Francis from the University of Lancaster.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aitkin, M., Vu, D. & Francis, B. A new Bayesian approach for determining the number of components in a finite mixture. METRON 73, 155–176 (2015). https://doi.org/10.1007/s40300-015-0068-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40300-015-0068-1