Model-based clustering with determinant-and-shape constraint

García-Escudero, Luis Angel; Mayo-Iscar, Agustín; Riani, Marco

doi:10.1007/s11222-020-09950-w

Model-based clustering with determinant-and-shape constraint

Published: 29 May 2020

Volume 30, pages 1363–1380, (2020)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Luis Angel García-Escudero ORCID: orcid.org/0000-0002-7617-3034¹,
Agustín Mayo-Iscar¹ &
Marco Riani²

415 Accesses
5 Citations
Explore all metrics

Abstract

Model-based approaches to cluster analysis and mixture modeling often involve maximizing classification and mixture likelihoods. Without appropriate constrains on the scatter matrices of the components, these maximizations result in ill-posed problems. Moreover, without constrains, non-interesting or “spurious” clusters are often detected by the EM and CEM algorithms traditionally used for the maximization of the likelihood criteria. Considering an upper bound on the maximal ratio between the determinants of the scatter matrices seems to be a sensible way to overcome these problems by affine equivariant constraints. Unfortunately, problems still arise without also controlling the elements of the “shape” matrices. A new methodology is proposed that allows both control of the scatter matrices determinants and also the shape matrices elements. Some theoretical justification is given. A fast algorithm is proposed for this doubly constrained maximization. The methodology is also extended to robust model-based clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advances in Robust Constrained Model Based Clustering

Recent Developments in Model-Based Clustering with Applications

Model-Based Clustering

Article Open access 01 October 2016

References

Andrews, J., Wickins, J., Boers, N., McNicholas, P.: teigen: an R package for model-based clustering and classification via the multivariate \(t\) distribution. J. Stat. Softw. 83, 1–32 (2018)
Google Scholar
Bagnato, L., Punzo, A., Zoia, M.G.: The multivariate leptokurtic-normal distribution and its application in model-based clustering. Can. J. Stat. 45, 95–119 (2017)
MathSciNet MATH Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
MathSciNet MATH Google Scholar
Baudry, J.P., Celeux, G.: EM for mixtures—initialization requires special care. Stat. Comput. 25, 713–726 (2015)
MathSciNet MATH Google Scholar
Biernacki, C., Chretien, S.: Degeneracy in the maximum likelihood estimation of univariate. Stat. Probab. Lett. 61, 373–382 (2003)
MATH Google Scholar
Biernacki, C., Lourme, A.: Stable and visualizable Gaussian parsimonious clustering models. Stat. Comput. 24, 953–969 (2014)
MathSciNet MATH Google Scholar
Browne, R., Subedi, S., McNicholas, P.: Constrained optimization for a subset of the Gaussian parsimonious clustering models (2013). preprint available at arXiv:1306.5824
Celeux, G., Govaert, A.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data. 14, 315–332 (1992)
MathSciNet MATH Google Scholar
Cerioli, A., García-Escudero, L., Mayo-Iscar, A., Riani, M.: Finding the number of normal groups in model-based clustering via constrained likelihoods. J. Comput. Graph Stat. 27, 404–416 (2018)
MathSciNet Google Scholar
Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. J. Am. Stat. Assoc. 111, 1648–1659 (2016)
MathSciNet Google Scholar
Dang, U., Browne, R., McNicholas, P.D.: Mixtures of multivariate power exponential distributions. Biometrics 71, 1081–1089 (2015)
MathSciNet MATH Google Scholar
Day, N.: Estimating the components of a mixture of normal distributions. Biometrika 56, 463–474 (1969)
MathSciNet MATH Google Scholar
Dotto, F., Farcomeni, A., García-Escudero, L., Mayo-Iscar, A.: A reweighting approach to robust clustering. Stat. Comput. 28, 477–493 (2018)
MathSciNet MATH Google Scholar
Flury, B., Riedwyl, H.: Multivariate Statistics, A Practical Approach. Cambridge University Press, Cambridge (1988)
MATH Google Scholar
Friedman, H., Rubin, J.: On some invariant criteria for grouping data. J. Am. Stat. Assoc. 63, 1159–1178 (1967)
MathSciNet Google Scholar
Fritz, H., García-Escudero, L., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)
MathSciNet MATH Google Scholar
Gallegos, M., Ritter, G.: A robust method for cluster analysis. Ann. Stat. 33, 347–380 (2005)
MathSciNet MATH Google Scholar
Gallegos, M., Ritter, G.: Trimming algorithms for clustering contaminated grouped data and their robustness. Adv. Data Anal. Classif. 10, 135–167 (2009)
MathSciNet MATH Google Scholar
Gallegos, M.T.: Maximum likelihood clustering with outliers. In: Jajuga, K., Sokolowski, A., Bock, H. (eds.) Classification, Clustering and Data Analysis: Recent Advances and Applications, pp. 247–255. Springer, Berlin (2002)
Google Scholar
García-Escudero, L., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)
MathSciNet MATH Google Scholar
García-Escudero, L., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585–599 (2011)
MathSciNet MATH Google Scholar
García-Escudero, L., Gordaliza, A., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 8, 27–43 (2014a)
MathSciNet MATH Google Scholar
García-Escudero, L., Gordaliza, A., Mayo-Iscar, A.: A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv. Data Anal. Classif. 8, 27–43 (2014b)
MathSciNet MATH Google Scholar
García-Escudero, L., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 25, 619–633 (2015)
MathSciNet MATH Google Scholar
García-Escudero, L., Gordaliza, A., Greselin, F., Ingrassia, S., Mayo-Iscar, A.: Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv. Data Anal. Classif. 12, 203–233 (2018)
MathSciNet MATH Google Scholar
Hathaway, R.: A constrained formulation of maximum likelihood estimation for normal mixture distributions. Ann. Stat. 13, 795–800 (1985)
MathSciNet MATH Google Scholar
Hennig, C., Liao, T.F.: How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J. R. Stat. Soc. Ser. C 62, 309–369 (2013)
MathSciNet Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
MATH Google Scholar
Ingrassia, S., Rocci, R.: Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput. Stat. Data Anal. 51, 5339–5351 (2007)
MathSciNet MATH Google Scholar
Kiefer, J., Wolfowitz, J.: Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Stat. 27, 887–906 (1956)
MathSciNet MATH Google Scholar
Maitra, R., Melnykov, V.: Simulating data to study performance of finite mixture modeling and clustering algorithms. J. Comput. Graph Stat. 19, 354–376 (2010)
MathSciNet Google Scholar
Maronna, R., Jacovkis, P.: Multivariate clustering procedures with variable metrics. Biometrics 30, 499–505 (1974)
MATH Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley, New York (2000)
MATH Google Scholar
Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52, 299–308 (2007)
MathSciNet MATH Google Scholar
Peel, D., McLachlan, G.J.: Robust mixture modelling using the \(t\) distribution. Stat. Comput. 10, 339–348 (2000)
Google Scholar
Punzo, A., McNicholas, P.D.: Parsimonious mixtures of multivariate contaminated normal distributions. Biomet. J. 58, 1506–1537 (2016)
MathSciNet MATH Google Scholar
Punzo, A., Mazza, A., McNicholas, P.D.: Contaminatedmixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. J. Stat. Softw. 85, 1–25 (2018)
Google Scholar
Riani, M., Perrotta, D., Torti, F.: FSDA: a Matlab toolbox for robust analysis and interactive data exploration. Chemom. Intell. Lab. Syst. 116, 17–32 (2012)
Google Scholar
Riani, M., Cerioli, A., Perrotta, D., Torti, F.: Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library. Adv. Data Anal. Classif. 9, 461–481 (2015)
MathSciNet MATH Google Scholar
Riani, M., Atkinson, A., Cerioli, A., Corbellini, A.: Efficient robust methods via monitoring for clustering and multivariate data analysis. Pattern Recognit. 88, 246–260 (2019)
Google Scholar
Ritter, G.: Cluster Analysis and Variable Selection. CRC Press, Boca Raton (2014)
MATH Google Scholar
Rocci, R., Gattone, S., Di Mari, R.: A data driven equivariant approach to constrained Gaussian mixture modeling. Adv. Data Anal. Classif. 12, 235–260 (2018)
MathSciNet MATH Google Scholar
Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Google Scholar
Seo, B., Kim, D.: Root selection in normal mixture models. Comput. Stat. Data Anal. 56, 2454–2470 (2012)
MathSciNet MATH Google Scholar
Zhang, J., Liang, F.: Robust clustering using exponential power mixtures. Biometrics 66, 1078–1086 (2010)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Operational Research and IMUVA, University of Valladolid, Valladolid, Spain
Luis Angel García-Escudero & Agustín Mayo-Iscar
Department of Economics and Management and Interdepartmental Centre of Robust Statistics, University of Parma, Parma, Italy
Marco Riani

Authors

Luis Angel García-Escudero
View author publications
You can also search for this author in PubMed Google Scholar
Agustín Mayo-Iscar
View author publications
You can also search for this author in PubMed Google Scholar
Marco Riani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis Angel García-Escudero.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is partially supported by Spanish Ministerio de Economía y Competitividad, Grant MTM2017-86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, Grant VA005P17 and VA002G18. This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy. M.R. gratefully acknowledges support from the CRoNoS project, reference CRoNoS COST Action IC1408 and the University of Parma project “Statistics for fraud detection, with 237 applications to trade data and financial statement”. The authors also thank the editor, the associate editor, and the anonymous referees for their constructive comments.

Rights and permissions

Reprints and permissions

About this article

Cite this article

García-Escudero, L.A., Mayo-Iscar, A. & Riani, M. Model-based clustering with determinant-and-shape constraint. Stat Comput 30, 1363–1380 (2020). https://doi.org/10.1007/s11222-020-09950-w

Download citation

Received: 10 January 2019
Accepted: 20 May 2020
Published: 29 May 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11222-020-09950-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-based clustering with determinant-and-shape constraint

Abstract

Access this article

Similar content being viewed by others

Advances in Robust Constrained Model Based Clustering

Recent Developments in Model-Based Clustering with Applications

Model-Based Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Model-based clustering with determinant-and-shape constraint

Abstract

Access this article

Similar content being viewed by others

Advances in Robust Constrained Model Based Clustering

Recent Developments in Model-Based Clustering with Applications

Model-Based Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation