Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Côme, Etienne; Jouvin, Nicolas; Latouche, Pierre; Bouveyron, Charles

doi:10.1007/s11634-021-00440-z

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Regular Article
Published: 13 April 2021

Volume 15, pages 957–986, (2021)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Etienne Côme¹,
Nicolas Jouvin ORCID: orcid.org/0000-0002-0331-1571^2,3,
Pierre Latouche^3,4 &
…
Charles Bouveyron^5,6

719 Accesses
7 Citations
19 Altmetric
1 Mention
Explore all metrics

Abstract

Finding a set of nested partitions of a dataset is useful to uncover relevant structure at different scales, and is often dealt with a data-dependent methodology. In this paper, we introduce a general two-step methodology for model-based hierarchical clustering. Considering the integrated classification likelihood criterion as an objective function, this work applies to every discrete latent variable models (DLVMs) where this quantity is tractable. The first step of the methodology involves maximizing the criterion with respect to the partition. Addressing the known problem of sub-optimal local maxima found by greedy hill climbing heuristics, we introduce a new hybrid algorithm based on a genetic algorithm which allows to efficiently explore the space of solutions. The resulting algorithm carefully combines and merges different solutions, and allows the joint inference of the number K of clusters as well as the clusters themselves. Starting from this natural partition, the second step of the methodology is based on a bottom-up greedy procedure to extract a hierarchy of clusters. In a Bayesian context, this is achieved by considering the Dirichlet cluster proportion prior parameter $\alpha $ as a regularization term controlling the granularity of the clustering. A new approximation of the criterion is derived as a log-linear function of $\alpha $, enabling a simple functional form of the merge decision criterion. This second step allows the exploration of the clustering at coarser scales. The proposed approach is compared with existing strategies on simulated as well as real settings, and its results are shown to be particularly relevant. A reference implementation of this work is available in the R-package greed accompanying the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Bayesian estimators for latent variable cluster models

Article Open access 31 October 2017

Model-Based Clustering with HDBSCAN*

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Article 25 January 2024

Notes

Or a product of Dirichlet distributions ${{\,\mathrm{Dir}\,}}_{{K_{r}}}(\varvec{\alpha }_r) \times {{\,\mathrm{Dir}\,}}_{{K_{c}}}(\varvec{\alpha }_c)$ in the case of co-clustering with the LBM. Except for an additional notation burden, the rest of the discussion easily extends to this case, which is discussed in detail in the Supplementary Materials.
Available at http://github.com/comeetie/greed
Available at http://www-personal.umich.edu/~mejn/netdata/.
Available at http://data.assemblee-nationale.fr/.
Available at http://www.redhotjazz.com/.

References

Adamic LA, Glance N (2005) The Political Blogosphere and the 2004 U.S. Election: Divided They Blog. In: Proceedings of the 3rd International Workshop on Link Discovery. LinkKDD ’05. Chicago, Illinois: ACM, pp 36–43 (cit. on p. 22)
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automatic Control 19(6):716–723
Article MathSciNet Google Scholar
Andrews JL, McNicholas PD (2013) Using evolutionary algorithms for model-based clustering. Pattern Recognit Lett 34(9):987–992
Article Google Scholar
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. In: Biometrics, pp 803–821 (cit. on p. 28)
Bar-Joseph Z, Gifford DK, Jaakkola TS (2001) Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17(1):S22–S29
Article Google Scholar
Bates D, Maechler M (2019) Matrix: sparse and dense matrix classes and methods. R package version 1.2-17 (cit. on p. 7)
Baudry J-P et al (2010) Combining Mixture Components for Clustering. In: Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America 9 2, pp 332–353 (cit. on p. 28)
Bengtsson H (2019) Future: unified parallel and distributed processing in R for everyone. R package version 1.13.0 (cit. on p. 7)
Bertoletti M, Friel N, Rastelli R (2015) Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion. METRON 73(2):177–199
Article MathSciNet Google Scholar
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 7:719–725
Article Google Scholar
Biernacki C, Celeux G, Govaert G (2010) Exact and monte carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140:2991–3002
Article MathSciNet Google Scholar
Blei David M, Alp Kucukelbir, McAuliffe Jon D (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859–877
Article MathSciNet Google Scholar
Bouveyron C et al (2019) Model-Based Clustering and Classification for Data Science: With Applications in R. Vol. 50. Cambridge University Press (cit. on p. 2)
Cole R M (1998) Clustering with genetic algorithms (cit. on p. 27)
Côme E, Latouche P (2015) Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Stat Model 15(6):564–589
Article MathSciNet Google Scholar
Côme E et al (2021) Supplementary Materials: Hierarchical clustering with discrete latent variable models and the integrated classification likelihood. In: Advances in Data Analysis and Classification (cit. on pp 6, 19, 30)
Corneli M, Latouche P, Rossi F (2016) Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks. In: Neurocomputing 192. Advances in artificial neural networks, machine learning and computational intelligence, pp 81–91 (cit. on p. 6)
Daudin J, Picard F, Robin S (2008) A mixture model for random graph. Stat Comput 18:1–36
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. J Royal Stat Soc Ser B 39(1):1–38
MathSciNet MATH Google Scholar
Eddelbuettel D, Balamuta JJ (2017) Extending extitR with extitC++: A Brief Introduction to extitRcpp. In: PeerJ Preprints 5, p e3188v1 (cit. on p. 7)
Eddelbuettel D, Sanderson C (2014) RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Comput Stat Data Anal 71:1054–1063
Article MathSciNet Google Scholar
Eiben AE, Smith JE (2004) Introduction to evolutionary computing, 2nd edn. Springer, Berlin
MATH Google Scholar
Everitt Brian S, Landau Sabine, Leese Morven (2011) Cluster Analysis, Fifth Edition (Wiley Series in Probability and Statistics). 5th. Wiley Series in Probability and Statistics. Wiley (cit. on p. 2)
Fraley C (1998) Algorithms for model-based Gaussian hierarchical clustering. SIAM J Sci Comput 20(1):270–281
Article MathSciNet Google Scholar
Fruhwirth-Schnatter S, Celeux G, Robert CP (2019) Handbook of mixture analysis. Chapman and Hall/CRC, London
Book Google Scholar
Gelman A et al (2004) Bayesian data analysis, 2nd edn. Chapman & Hall/CRC, London
MATH Google Scholar
Gleiser PM, Danon L (2003) Community structure in Jazz. Adv Complex Syst 6(4):565–573
Article Google Scholar
Govaert G, Nadif M (2010) Latent block model for contingency table. Commun Stat Theory Methods 39(3):416–425
Article MathSciNet Google Scholar
Heller KA, Ghahramani Z (2005) Bayesian hierarchical clustering. In: Proceedings of the 22nd international conference on Machine learning. ACM, pp 297–304 (cit. on p. 28)
Hruschka E R et al (2009) A survey of evolutionary algorithms for clustering. In: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39.2, pp 133–155 (cit. on pp. 10, 27)
Karrer B, Newman MEJ (2011) Stochastic blockmodels and community structure in networks. Phys Rev E 83(1):016107
Article MathSciNet Google Scholar
Mariadassou M, Robin S, Vacher C (2010) Uncovering latent structure in valued graphs: a variational approach. Ann Appl Stat 4(2):715–742
Article MathSciNet Google Scholar
Matias C, Robin S (2014) Modeling heterogeneity in random graphs through latent space models: a selective review. ESAIM Proc Surv 47:55–74
Article MathSciNet Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, Hoboken
Book Google Scholar
McLachlan GJ, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, Hoboken
MATH Google Scholar
Murtagh F, Raftery AE (1984) Fitting straight lines to point patterns. Pattern Recognit 17(5):479–483
Article Google Scholar
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys. Rev. E 69(2):026113
Article Google Scholar
Newman MEJ, Reinert G (2016) Estimating the number of communities in a network. Phys Rev Lett 117(7):078301
Article Google Scholar
Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087
Article MathSciNet Google Scholar
Peixoto TP (2014) Hierarchical block structures and high-resolution model selection in large networks. Phys Rev X 4(1):011047
Google Scholar
Qin T, Rohe K (2013) Regularized spectral clustering under the degree-corrected stochastic blockmodel”. In: Proceedings of Nips (cit. on p. 19)
Core Team R (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (cit. on p. 7)
Riolo MA et al (2017) Efficient method for estimating the number of communities in a network. Phys Rev E 96(3):032310
Article MathSciNet Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet Google Scholar
Scrucca L (2016) Genetic algorithms for subset selection in model-based clustering. In: Unsupervised Learning Algorithms. Springer, pp 55–70 (cit. on p. 27)
Sneath PHA (1957) The application of computers to taxonomy. Microbiology 17(1):201–226
Article Google Scholar
Sokal R R, Michener C D (1958) A statistical method for evaluating systematic relationships. In: University of Kansas Science Bulletin 38, pp 1409–1438 (cit. on p. 27)
Tessier D et al (2006) Evolutionary latent class clustering of qualitative data. Tech Rep (cit. on pp. 6, 10, 27)
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res, pp 2837–2854 (cit. on p. 19)
Wang YJ, Wong GY (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82(397):8–19
Article MathSciNet Google Scholar
Ward J, Joe H (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Article MathSciNet Google Scholar
Wyse J, Friel N, Latouche P (2017) Inferring structure in bipartite networks using the latent blockmodel and exact ICL. Network Sci 5(1):45–69
Article Google Scholar
Zhao Y, Levina E, Zhu J (2012) Consistency of community detection in networks under degree-corrected stochastic block models. Ann Stat 40(4):266–2292
Article MathSciNet Google Scholar
Zhong S, Ghosh J (2003) A unified framework for model-based clustering. J Mach Learn Res 4:1001–1037
MathSciNet MATH Google Scholar
Zhu Y, Yan X, Moore C (2014) Oriented and degreegenerated block models: generating and inferring communities with inhomogeneous degree distributions. J Complex Netw 2(1):1–18
Article Google Scholar
Zreik R, Latouche P, Bouveyron C (2016) The dynamic random subgraph model for the clustering of evolving networks. Comput Stat (cit. on p. 6)

Download references

Acknowledgements

The authors would like to thank the editor and the two anonymous referees for their fruitful comments which helped to improve this paper.

Author information

Authors and Affiliations

COSYS/GRETTIA, Université Gustave-Eiffel, Noisy-Le-Grand, France
Etienne Côme
Université Paris 1 Panthéon-Sorbonne, SAMM, France
Nicolas Jouvin
FP2M, CNRS FR 2036, Paris, France
Nicolas Jouvin & Pierre Latouche
Université de Paris, MAP5, CNRS, Paris, France
Pierre Latouche
Université Côte d’Azur, CNRS, Laboratoire J.A. Dieudonné, Nice, France
Charles Bouveyron
Inria, Maasai Research Team, Nice, France
Charles Bouveyron

Authors

Etienne Côme
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Jouvin
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Latouche
View author publications
You can also search for this author in PubMed Google Scholar
Charles Bouveyron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Jouvin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 312 KB)

A Dealing with bipartitions

Genetic algorithm The hybrid algorithm presented in Sect. 2 can be easily extended to the co-clustering problem. The latter simultaneously seeks for a partition of the n rows and p columns of a data matrix $\varvec{X}\in {\mathbb {R}}^{n\times p}$. In this case, we work with a partition $\mathcal {P}$ of $\{1,\ldots , n+p\}$ with the additional constraints that it decomposes into two disjoint sets of clusters that corresponds to a partition of $\{1,\ldots ,n\}$ and $\{n+1,\ldots ,n+p\}$ respectively (one for the rows and one for the columns):

$$\begin{aligned} \mathcal {P} =\left\{ C_1^r,\ldots ,C_{{K_{r}}}^r,C_1^c,\ldots ,C_{{K_{c}}}^c\right\} :\left\{ \begin{array}{cl} \bigcup _{k}C^r_k &{}= \{1,\ldots ,n\},\\ \bigcup _{l}C^c_l &{}= \{n+1,\ldots ,n+p\} \end{array} \right. . \end{aligned}$$

(15)

This constraint can be easily incorporated by defining ${{\,\mathrm{ICL_{\textit{ex}}}\,}}(\mathcal {P})=-\infty $ for partitions that do not fulfill this constraint and by initializing the algorithm with admissible solutions. This is sufficient to ensure that the obtained solutions will also be compatible with the constraints, since the admissible set of partitions is closed under the crossover and mutation operations used by the algorithm.

Hierarchical algorithm Furthermore, the hierarchical methodology of Sect. 3 can also be easily extended to bi-partitions. Indeed, the LBM prior

$$\begin{aligned} p(\varvec{\pi }\mid \alpha ) = {{\,\mathrm{Dir}\,}}_{{K_{r}}}(\varvec{\pi }^{r}\mid \alpha ) \times {{\,\mathrm{Dir}\,}}_{{K_{c}}}(\varvec{\pi }^{c}\mid \alpha ), \end{aligned}$$

leaves a factorized integrated likelihood for $p(\varvec{Z}\mid \alpha )$, with a common parameter $\alpha $ (see Côme et al. 2021 for a detailed discussion). Thus, the ${{\,\mathrm{ICL_{\textit{lin}}}\,}}$ approximation of Equation (10) is still log-linear in $\alpha $ and writes:

$$\begin{aligned} {{\,\mathrm{ICL_{\textit{lin}}}\,}}(\varvec{Z}, \alpha ) = ({K_{r}}-1) \log (\alpha ) + ({K_{c}}- 1) \log (\alpha ) + I(\varvec{Z}), \end{aligned}$$

(16)

with $I(\varvec{Z}) = I(\varvec{Z}^{r}) + I(\varvec{Z}^{c})$ the intercepts defined in Equation (9). Hence, with the constraint that a merge cannot be done between rows and columns clusters, one can look for the best row or column fusion to do at each step, therefore building two dendrograms in parallel, with a shared ($\alpha _f)_f$ sequence.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Côme, E., Jouvin, N., Latouche, P. et al. Hierarchical clustering with discrete latent variable models and the integrated classification likelihood. Adv Data Anal Classif 15, 957–986 (2021). https://doi.org/10.1007/s11634-021-00440-z

Download citation

Received: 16 November 2020
Revised: 23 March 2021
Accepted: 25 March 2021
Published: 13 April 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11634-021-00440-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Abstract

Access this article

Similar content being viewed by others

Optimal Bayesian estimators for latent variable cluster models

Model-Based Clustering with HDBSCAN*

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 312 KB)

A Dealing with bipartitions

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Abstract

Access this article

Similar content being viewed by others

Optimal Bayesian estimators for latent variable cluster models

Model-Based Clustering with HDBSCAN*

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 312 KB)

A Dealing with bipartitions

A Dealing with bipartitions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation