Skip to main content

Advertisement

Log in

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Finding a set of nested partitions of a dataset is useful to uncover relevant structure at different scales, and is often dealt with a data-dependent methodology. In this paper, we introduce a general two-step methodology for model-based hierarchical clustering. Considering the integrated classification likelihood criterion as an objective function, this work applies to every discrete latent variable models (DLVMs) where this quantity is tractable. The first step of the methodology involves maximizing the criterion with respect to the partition. Addressing the known problem of sub-optimal local maxima found by greedy hill climbing heuristics, we introduce a new hybrid algorithm based on a genetic algorithm which allows to efficiently explore the space of solutions. The resulting algorithm carefully combines and merges different solutions, and allows the joint inference of the number K of clusters as well as the clusters themselves. Starting from this natural partition, the second step of the methodology is based on a bottom-up greedy procedure to extract a hierarchy of clusters. In a Bayesian context, this is achieved by considering the Dirichlet cluster proportion prior parameter \(\alpha \) as a regularization term controlling the granularity of the clustering. A new approximation of the criterion is derived as a log-linear function of \(\alpha \), enabling a simple functional form of the merge decision criterion. This second step allows the exploration of the clustering at coarser scales. The proposed approach is compared with existing strategies on simulated as well as real settings, and its results are shown to be particularly relevant. A reference implementation of this work is available in the R-package greed accompanying the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Or a product of Dirichlet distributions \({{\,\mathrm{Dir}\,}}_{{K_{r}}}(\varvec{\alpha }_r) \times {{\,\mathrm{Dir}\,}}_{{K_{c}}}(\varvec{\alpha }_c)\) in the case of co-clustering with the LBM. Except for an additional notation burden, the rest of the discussion easily extends to this case, which is discussed in detail in the Supplementary Materials.

  2. Available at http://github.com/comeetie/greed

  3. Available at http://www-personal.umich.edu/~mejn/netdata/.

  4. Available at http://data.assemblee-nationale.fr/.

  5. Available at http://www.redhotjazz.com/.

References

  • Adamic LA, Glance N (2005) The Political Blogosphere and the 2004 U.S. Election: Divided They Blog. In: Proceedings of the 3rd International Workshop on Link Discovery. LinkKDD ’05. Chicago, Illinois: ACM, pp 36–43 (cit. on p. 22)

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automatic Control 19(6):716–723

    Article  MathSciNet  Google Scholar 

  • Andrews JL, McNicholas PD (2013) Using evolutionary algorithms for model-based clustering. Pattern Recognit Lett 34(9):987–992

    Article  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. In: Biometrics, pp 803–821 (cit. on p. 28)

  • Bar-Joseph Z, Gifford DK, Jaakkola TS (2001) Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17(1):S22–S29

    Article  Google Scholar 

  • Bates D, Maechler M (2019) Matrix: sparse and dense matrix classes and methods. R package version 1.2-17 (cit. on p. 7)

  • Baudry J-P et al (2010) Combining Mixture Components for Clustering. In: Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America 9 2, pp 332–353 (cit. on p. 28)

  • Bengtsson H (2019) Future: unified parallel and distributed processing in R for everyone. R package version 1.13.0 (cit. on p. 7)

  • Bertoletti M, Friel N, Rastelli R (2015) Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion. METRON 73(2):177–199

    Article  MathSciNet  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 7:719–725

    Article  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2010) Exact and monte carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140:2991–3002

    Article  MathSciNet  Google Scholar 

  • Blei David M, Alp Kucukelbir, McAuliffe Jon D (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859–877

    Article  MathSciNet  Google Scholar 

  • Bouveyron C et al (2019) Model-Based Clustering and Classification for Data Science: With Applications in R. Vol. 50. Cambridge University Press (cit. on p. 2)

  • Cole R M (1998) Clustering with genetic algorithms (cit. on p. 27)

  • Côme E, Latouche P (2015) Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Stat Model 15(6):564–589

    Article  MathSciNet  Google Scholar 

  • Côme E et al (2021) Supplementary Materials: Hierarchical clustering with discrete latent variable models and the integrated classification likelihood. In: Advances in Data Analysis and Classification (cit. on pp 6, 19, 30)

  • Corneli M, Latouche P, Rossi F (2016) Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks. In: Neurocomputing 192. Advances in artificial neural networks, machine learning and computational intelligence, pp 81–91 (cit. on p. 6)

  • Daudin J, Picard F, Robin S (2008) A mixture model for random graph. Stat Comput 18:1–36

    Article  MathSciNet  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. J Royal Stat Soc Ser B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Eddelbuettel D, Balamuta JJ (2017) Extending extitR with extitC++: A Brief Introduction to extitRcpp. In: PeerJ Preprints 5, p e3188v1 (cit. on p. 7)

  • Eddelbuettel D, Sanderson C (2014) RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Comput Stat Data Anal 71:1054–1063

    Article  MathSciNet  Google Scholar 

  • Eiben AE, Smith JE (2004) Introduction to evolutionary computing, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  • Everitt Brian S, Landau Sabine, Leese Morven (2011) Cluster Analysis, Fifth Edition (Wiley Series in Probability and Statistics). 5th. Wiley Series in Probability and Statistics. Wiley (cit. on p. 2)

  • Fraley C (1998) Algorithms for model-based Gaussian hierarchical clustering. SIAM J Sci Comput 20(1):270–281

    Article  MathSciNet  Google Scholar 

  • Fruhwirth-Schnatter S, Celeux G, Robert CP (2019) Handbook of mixture analysis. Chapman and Hall/CRC, London

    Book  Google Scholar 

  • Gelman A et al (2004) Bayesian data analysis, 2nd edn. Chapman & Hall/CRC, London

    MATH  Google Scholar 

  • Gleiser PM, Danon L (2003) Community structure in Jazz. Adv Complex Syst 6(4):565–573

    Article  Google Scholar 

  • Govaert G, Nadif M (2010) Latent block model for contingency table. Commun Stat Theory Methods 39(3):416–425

    Article  MathSciNet  Google Scholar 

  • Heller KA, Ghahramani Z (2005) Bayesian hierarchical clustering. In: Proceedings of the 22nd international conference on Machine learning. ACM, pp 297–304 (cit. on p. 28)

  • Hruschka E R et al (2009) A survey of evolutionary algorithms for clustering. In: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39.2, pp 133–155 (cit. on pp. 10, 27)

  • Karrer B, Newman MEJ (2011) Stochastic blockmodels and community structure in networks. Phys Rev E 83(1):016107

    Article  MathSciNet  Google Scholar 

  • Mariadassou M, Robin S, Vacher C (2010) Uncovering latent structure in valued graphs: a variational approach. Ann Appl Stat 4(2):715–742

    Article  MathSciNet  Google Scholar 

  • Matias C, Robin S (2014) Modeling heterogeneity in random graphs through latent space models: a selective review. ESAIM Proc Surv 47:55–74

    Article  MathSciNet  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixture models. Wiley, Hoboken

    Book  Google Scholar 

  • McLachlan GJ, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, Hoboken

    MATH  Google Scholar 

  • Murtagh F, Raftery AE (1984) Fitting straight lines to point patterns. Pattern Recognit 17(5):479–483

    Article  Google Scholar 

  • Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys. Rev. E 69(2):026113

    Article  Google Scholar 

  • Newman MEJ, Reinert G (2016) Estimating the number of communities in a network. Phys Rev Lett 117(7):078301

    Article  Google Scholar 

  • Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087

    Article  MathSciNet  Google Scholar 

  • Peixoto TP (2014) Hierarchical block structures and high-resolution model selection in large networks. Phys Rev X 4(1):011047

    Google Scholar 

  • Qin T, Rohe K (2013) Regularized spectral clustering under the degree-corrected stochastic blockmodel”. In: Proceedings of Nips (cit. on p. 19)

  • Core Team R (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (cit. on p. 7)

  • Riolo MA et al (2017) Efficient method for estimating the number of communities in a network. Phys Rev E 96(3):032310

    Article  MathSciNet  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MathSciNet  Google Scholar 

  • Scrucca L (2016) Genetic algorithms for subset selection in model-based clustering. In: Unsupervised Learning Algorithms. Springer, pp 55–70 (cit. on p. 27)

  • Sneath PHA (1957) The application of computers to taxonomy. Microbiology 17(1):201–226

    Article  Google Scholar 

  • Sokal R R, Michener C D (1958) A statistical method for evaluating systematic relationships. In: University of Kansas Science Bulletin 38, pp 1409–1438 (cit. on p. 27)

  • Tessier D et al (2006) Evolutionary latent class clustering of qualitative data. Tech Rep (cit. on pp. 6, 10, 27)

  • Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res, pp 2837–2854 (cit. on p. 19)

  • Wang YJ, Wong GY (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82(397):8–19

    Article  MathSciNet  Google Scholar 

  • Ward J, Joe H (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

    Article  MathSciNet  Google Scholar 

  • Wyse J, Friel N, Latouche P (2017) Inferring structure in bipartite networks using the latent blockmodel and exact ICL. Network Sci 5(1):45–69

    Article  Google Scholar 

  • Zhao Y, Levina E, Zhu J (2012) Consistency of community detection in networks under degree-corrected stochastic block models. Ann Stat 40(4):266–2292

    Article  MathSciNet  Google Scholar 

  • Zhong S, Ghosh J (2003) A unified framework for model-based clustering. J Mach Learn Res 4:1001–1037

    MathSciNet  MATH  Google Scholar 

  • Zhu Y, Yan X, Moore C (2014) Oriented and degreegenerated block models: generating and inferring communities with inhomogeneous degree distributions. J Complex Netw 2(1):1–18

    Article  Google Scholar 

  • Zreik R, Latouche P, Bouveyron C (2016) The dynamic random subgraph model for the clustering of evolving networks. Comput Stat (cit. on p. 6)

Download references

Acknowledgements

The authors would like to thank the editor and the two anonymous referees for their fruitful comments which helped to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Jouvin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 312 KB)

A Dealing with bipartitions

A Dealing with bipartitions

Genetic algorithm The hybrid algorithm presented in Sect. 2 can be easily extended to the co-clustering problem. The latter simultaneously seeks for a partition of the n rows and p columns of a data matrix \(\varvec{X}\in {\mathbb {R}}^{n\times p}\). In this case, we work with a partition \(\mathcal {P}\) of \(\{1,\ldots , n+p\}\) with the additional constraints that it decomposes into two disjoint sets of clusters that corresponds to a partition of \(\{1,\ldots ,n\}\) and \(\{n+1,\ldots ,n+p\}\) respectively (one for the rows and one for the columns):

$$\begin{aligned} \mathcal {P} =\left\{ C_1^r,\ldots ,C_{{K_{r}}}^r,C_1^c,\ldots ,C_{{K_{c}}}^c\right\} :\left\{ \begin{array}{cl} \bigcup _{k}C^r_k &{}= \{1,\ldots ,n\},\\ \bigcup _{l}C^c_l &{}= \{n+1,\ldots ,n+p\} \end{array} \right. . \end{aligned}$$
(15)

This constraint can be easily incorporated by defining \({{\,\mathrm{ICL_{\textit{ex}}}\,}}(\mathcal {P})=-\infty \) for partitions that do not fulfill this constraint and by initializing the algorithm with admissible solutions. This is sufficient to ensure that the obtained solutions will also be compatible with the constraints, since the admissible set of partitions is closed under the crossover and mutation operations used by the algorithm.

Hierarchical algorithm Furthermore, the hierarchical methodology of Sect. 3 can also be easily extended to bi-partitions. Indeed, the LBM prior

$$\begin{aligned} p(\varvec{\pi }\mid \alpha ) = {{\,\mathrm{Dir}\,}}_{{K_{r}}}(\varvec{\pi }^{r}\mid \alpha ) \times {{\,\mathrm{Dir}\,}}_{{K_{c}}}(\varvec{\pi }^{c}\mid \alpha ), \end{aligned}$$

leaves a factorized integrated likelihood for \(p(\varvec{Z}\mid \alpha )\), with a common parameter \(\alpha \) (see Côme et al. 2021 for a detailed discussion). Thus, the \({{\,\mathrm{ICL_{\textit{lin}}}\,}}\) approximation of Equation (10) is still log-linear in \(\alpha \) and writes:

$$\begin{aligned} {{\,\mathrm{ICL_{\textit{lin}}}\,}}(\varvec{Z}, \alpha ) = ({K_{r}}-1) \log (\alpha ) + ({K_{c}}- 1) \log (\alpha ) + I(\varvec{Z}), \end{aligned}$$
(16)

with \(I(\varvec{Z}) = I(\varvec{Z}^{r}) + I(\varvec{Z}^{c})\) the intercepts defined in Equation (9). Hence, with the constraint that a merge cannot be done between rows and columns clusters, one can look for the best row or column fusion to do at each step, therefore building two dendrograms in parallel, with a shared (\(\alpha _f)_f\) sequence.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Côme, E., Jouvin, N., Latouche, P. et al. Hierarchical clustering with discrete latent variable models and the integrated classification likelihood. Adv Data Anal Classif 15, 957–986 (2021). https://doi.org/10.1007/s11634-021-00440-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-021-00440-z

Keywords

Navigation