A new Bayesian approach for determining the number of components in a finite mixture

Aitkin, Murray; Vu, Duy; Francis, Brian

doi:10.1007/s40300-015-0068-1

A new Bayesian approach for determining the number of components in a finite mixture

Published: 09 July 2015

Volume 73, pages 155–176, (2015)
Cite this article

METRON Aims and scope Submit manuscript

Murray Aitkin¹,
Duy Vu¹ &
Brian Francis²

189 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

This article evaluates a new Bayesian approach to determining the number of components in a finite mixture. We evaluate through simulation studies mixtures of normals and latent class mixtures of Bernoulli responses. For normal mixtures we use a “gold standard” set of population models based on a well-known “testbed” data set—the galaxy recession velocity data set of Roeder (J Am Stat Assoc 85:617–624, 1990). For Bernoulli latent class mixtures we consider models for psychiatric diagnosis Berkhof et al. (Stat Sin 13:423–442, 2003). The new approach is based on comparing models with different numbers of components through their posterior deviance distributions, based on non-informative or diffuse priors. Simulations show that even large numbers of closely spaced normal components can be identified with sufficiently large samples, while for latent classes with Bernoulli responses identification is more complex, though it again improves with increasing sample size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Article 17 October 2016

A Systematic Review of Hidden Markov Models and Their Applications

Article 12 May 2020

Notes

Full details can be found in their paper.
This may seem counter-intuitive, since the ML estimate of the saturated model always has the smallest frequentist deviance. The single observation in each “class” however gives a very diffuse likelihood for each \(p_{ij}\) and this leads to a very diffuse and large deviance distribution.

References

Aitkin, M.: The calibration of p-values, posterior Bayes factors and the AIC from the posterior distribution of the likelihood (with discussion). Stat. Comput. 7, 253–272 (1997)
Article Google Scholar
Aitkin, M.: Likelihood and Bayesian analysis of mixtures. Stat. Model. 1, 287–304 (2001)
Article Google Scholar
Aitkin, M.: Statistical Inference: an Integrated Bayesian/Likelihood Approach. Chapman and Hall/CRC Press, Boca Raton (2010)
Book Google Scholar
Aitkin, M.: How many components in a finite mixture? In: Mengersen, K.L., Robert, C.P., Titterington, D.M. (eds.) Mixtures Estimation and Applications. Wiley, Chichester (2011)
Google Scholar
Bartlett, M.S.: A comment on D. V. Lindley’s statistical paradox. Biometrika 44, 533–534 (1957)
Article MathSciNet MATH Google Scholar
Berkhof, J., van Mechelen, I., Gelman, A.: A Bayesian approach to the selection and testing of mixture models. Stat. Sin. 13, 423–442 (2003)
MATH Google Scholar
Celeux, G., Forbes, F., Robert, C.P., Titterington, D.M.: Deviance information criteria for missing data models. Bayesian Anal. 1, 651–674 (2006)
Article MathSciNet Google Scholar
Dempster, A.P.: The direct use of likelihood in significance testing. Stat. Comput. 7, 247–252 (1997)
Article Google Scholar
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90, 577–588 (1995)
Article MathSciNet MATH Google Scholar
Garcia-Escudero, L.A., Gordaliza, A., Matran, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 01/2015; 2015. doi:10.1007/s11222-014-9455-3
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Article MATH Google Scholar
Lindley, D.V.: A statistical paradox. Biometrika 44, 187–192 (1957)
Article MathSciNet MATH Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Book MATH Google Scholar
Nylund, K.L., Asparouhov, T., Muthen, B.O.: Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct. Equ. Model. 14, 535–569 (2007)
Article MathSciNet Google Scholar
Phillips, D.B., Smith, A.F.M.: Bayesian model comparison via jump diffusions. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC Press, Boca Raton (1996)
Google Scholar
Postman, M., Huchra, J.P., Geller, M.J.: Probes of large-scale structures in the Corona Borealis region. Astron. J. 92, 1238–1247 (1986)
Article Google Scholar
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. B 59, 731–792 (1997)
Article MathSciNet MATH Google Scholar
Roeder, K.: Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Am. Stat. Assoc. 85, 617–624 (1990)
Article MATH Google Scholar
Roeder, K., Wasserman, L.: Practical Bayesian density estimation using mixtures of normals. J. Am. Stat. Assoc. 92, 894–902 (1997)
Article MathSciNet MATH Google Scholar
Spiegelhalter, D.J., Best, N., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 64, 583–639 (2002)
Article MATH Google Scholar
Stephens, M.: Bayesian analysis of mixtures with an unknown number of components–an alternative to reversible jump methods. Ann. Stat. 28, 40–74 (2000)
Article MathSciNet MATH Google Scholar
Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–550 (1987)
Article MathSciNet MATH Google Scholar
van Mechelen, I., De Boeck, P.: Implicit taxonomy in psychiatric diagnosis: a case study. J. Soc. Clin. Psychol. 8, 276–287 (1989)
Article Google Scholar

Download references

Acknowledgments

We are grateful for research support from the Australian Research Council under project DP120102902 for the support of Duy Vu for the period of this research (2012–2015), and for visits by Brian Francis from the University of Lancaster.

Author information

Authors and Affiliations

University of Melbourne, Melbourne, VIC, Australia
Murray Aitkin & Duy Vu
University of Lancaster, Lancaster, UK
Brian Francis

Authors

Murray Aitkin
View author publications
You can also search for this author in PubMed Google Scholar
Duy Vu
View author publications
You can also search for this author in PubMed Google Scholar
Brian Francis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Murray Aitkin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aitkin, M., Vu, D. & Francis, B. A new Bayesian approach for determining the number of components in a finite mixture. METRON 73, 155–176 (2015). https://doi.org/10.1007/s40300-015-0068-1

Download citation

Received: 10 October 2014
Accepted: 17 June 2015
Published: 09 July 2015
Issue Date: August 2015
DOI: https://doi.org/10.1007/s40300-015-0068-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new Bayesian approach for determining the number of components in a finite mixture

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

A Systematic Review of Hidden Markov Models and Their Applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new Bayesian approach for determining the number of components in a finite mixture

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

A Systematic Review of Hidden Markov Models and Their Applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation