Abstract
Hierarchical agglomerative clustering (HAC) with Ward’s linkage has been widely used since its introduction by Ward (Journal of the American Statistical Association, 58(301), 236–244, 1963). This article reviews extensions of HAC to various input data and contiguity-constrained HAC, and provides applicability conditions. In addition, different versions of the graphical representation of the results as a dendrogram are also presented and their properties are clarified. We clarify and complete the results already available in an heterogeneous literature using a uniform background. In particular, this study reveals an important distinction between a consistency property of the dendrogram and the absence of crossover within it. Finally, a simulation study shows that the constrained version of HAC can sometimes provide more relevant results than its unconstrained version despite the fact that the constraint leads to optimize the objective criterion on a reduced set of solutions at each step. Overall, this article provides comprehensive recommendations, both for the use of HAC and constrained HAC depending on the input data, and for the representation of the results.
Similar content being viewed by others
Notes
In the rare situation when the minimal linkage is achieved by more than one merger, a choice between these mergers has to be made. Different choices are made by different implementations of HAC.
In some cases, similarity measures are also supposed to take non-negative values, but we will not make this assumption in the present article.
The detailed analysis of all examples and counter-examples of this section is provided in Appendix 2.
The pre-processed and normalized data have been downloaded from the authors’ website at http://chromosome.sdsc.edu/mouse/hi-c/download.html (raw sequence data are also published on the GEO website, accession number GSE35156).
References
Ah-Pine, J., & Wang, X. (2016). Similarity based hierarchical clustering with an application to text collections. In Boström, H., Knobbe, A., Soares, C., & Papapetrou, P. (Eds.) Proceedings of the 15th International Symposium on Intelligent Data Analysis (IDA 2016), Lecture Notes in Computer Sciences (pp. 320–331). Stockholm.
Ambroise, C., Dehman, A., Neuvial, P., Rigaill, G., Vialaneix, N. (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Algorithms for Molecular Biology, 14, 22.
Arlot, S., Brault, V., Baudry, J.-P., Maugis, C., Michel, B. (2016). capushe: CAlibrating Penalities Using Slope HEuristics. R package version 1.1.1.
Arlot, S., Celisse, A., Harchaoui, Z. (2019). A kernel multiple change-point algorithm via model selection. Submitted for publication. arXiv:1202.3878v3. Now published in JMLR, see https://jmlr.org/papers/v20/16-155.html Bibtex entry: https://jmlr.org/papers/v20/16-155.bib.
Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3), 337–337.
Batagelj, V. (1981). Note on ultrametric hierarchical clustering algorithms. Psychometrika, 46(3), 351–352.
Bennett, K.D. (1996). Determination of the number of zones in a biostratigraphical sequence. New Phytologist, 132(1), 155–170.
Chavent, M., Kuentz-Simonet, V., Labenne, A., Saracco, J. (2018). Clustgeo2: an R package for hierarchical clustering with spatial constraints. Computational Statistics, 33(4), 1799–1822.
Chen, J., & Ye, J. (2008). Training SVM with indefinite kernels. In Cohen, W., McCallum, A., & Roweis, S. (Eds.) Proceedings of the 25th International Conference on Machine Learning (ICML 2008) (pp. 136–146). New York: ACM.
Chen, Y., Garcia, E., Gupta, M., Rahimi, A., Cazzanti, L. (2009). Similarity-based classification: concepts and algorithm. Journal of Machine Learning Research, 10, 747–776.
Danon, L., Diaz-Guilera, A., Duch, J., Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005, P09008.
Dehman, A. (2015). Spatial clustering of linkage disequilibrium blocks for genome-wide association studies, PhD thesis, Université Paris Saclay.
Dixon, J., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J., Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485, 376–380.
Ferligoj, A., & Batagelj, V. (1982). Clustering with relational constraint. Psychometrika, 47(4), 413–426.
Fraser, J., Ferrai, C., Chiariello, A.M., Schueler, M., Rito, T., Laudanno, G., Barbieri, M., Moore, B.L., Kraemer, D.C., Aitken, S., Xie, S.Q., Morris, K.J., Itoh, M., Kawaji, H., Jaeger, I., Hayashizaki, Y., Carninci, P., Forrest, A.R., The FANTOM Consortium, Semple, C.A., Dostie, J., Pombo, A., Nicodemi, M. (2015). Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Molecular Systems Biology, 11, 852.
Gordon, A. (1996). A survey of constrained classification. Computational Statistics & Data Analysis, 21(1), 17–29.
Grimm, E.C. (1987). CONISS: A FORTRAN 77 program for stratigraphically constrained analysis by the method of incremental sum of squares. Computers & Geosciences, 13(1), 13–35.
Haddad, N., Vaillant, C., Jost, D. (2017). IC-Finder: inferring robustly the hierarchical organization of chromatin folding. Nucleic Acids Research, 45(10), e81–e81.
Hartigan, J.A. (1967). Representation of similarity matrices by trees. Journal of the American Statistical Association, 62(320), 1140–1158.
Imakaev, M., Fudenberg, G., McCord, R., Naumova, N., Goloborodko, A., Lajoie, B., Dekker, J., Mirny, L. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003.
Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241–254.
Krislock, N., & Wolkowicz, H. (2012). Handbook on semidefinite, conic and polynomial optimization, volume 166 of International Series in Operations Research & Management Science, chapter Euclidean distance matrices and applications, (pp. 879–914). New York: Springer.
Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1–27.
Lance, G., & Williams, W. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. The Computer Journal, 9(4), 373–380.
Lebart, L. (1978). Programme d’agrégation avec contraintes. Les Cahiers de l’Analyse des Données, 3(3), 275–287.
Miyamoto, S., Abe, R., Endo, Y., Takeshita, J.-I. (2015). Ward method of hierarchical clustering for non-Euclidean similarity measures. In Proceedings of the VIIth International Conference of Soft Computing and Pattern Recognition (SoCPaR 2015). Fukuoka: IEEE.
Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion. Journal of Classification, 31(3), 274–295.
Qin, J., Lewis, D.P., Noble, W.S. (2003). Kernel hierarchical gene clustering from microarray expression data. Bioinformatics, 19(16), 2097–2104.
Rammal, R., Toulouse, G., Virasoro, M.A. (1986). Ultrametricity for physicists. Reviews of Modern Physics, 58(3), 765–788.
Schleif, F.-M., & Tino, P. (2015). Indefinite proximity learning: a review. Neural Computation, 27(10), 2039–2096.
Schoenberg, I. (1935). Remarks to Maurice fréchet’s article “Sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur l’espace de Hilbert”. Annals of Mathematics, 36, 724–732.
Schölkopf, B., & Smola, A.J. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press.
Steinley, D., & Hubert, L. (2008). Order-constrained solutions in K-means clustering: even better than being globally optimal. Psychometrika, 73(4), 647–664.
Strauss, T., & von Maltitz, M.J. (2017). Generalising Ward’s method for use with Manhattan distances. PLoS ONE, 12, e0168288.
Székely, G.J., & Rizzo, M.L. (2005). Hierarchical clustering via joint between-within distances: extending Ward’s minimum variance method. Journal of Classification, 22(2), 151–183.
Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.
Wickham, H. (2016). ggplot2: elegant graphics for data analysis. New York: Springer.
Wishart, D. (1969). An algorithm for hierarchical classifications. Biometrics, 25(1), 165–170.
Young, G., & Householder, A. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.
Zufferey, M., Tavernari, D., Oricchio, E., Ciriello, G. (2018). Comparison of computational methods for the identification of topologically associating domains. Genome Biology, 19(1), 217.
Acknowledgments
The authors would like to thank Marie Chavent for numerous instructive discussions on this paper. The authors are grateful to the GenoToul bioinformatics platform (INRAE Toulouse, http://bioinfo.genotoul.fr/) and its staff for providing computing facilities.
Funding
The PhD thesis of N.R. is funded by the INRAE/Inria doctoral program 2018. This work was also supported by the SCALES project funded by CNRS (Mission “Osez l’interdisciplinarité”).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1. Proof of Proposition 2
Proof of Proposition 2
We begin by noting that by Proposition 1, the only reversals that may occur are crossovers. With the notation of Proposition 2, a crossover at step t + 1 corresponds to the situation where:
By symmetry, we focus on the first case. With the notation of Proposition 2, and using the Lance-Willams formula (4), the first condition is equivalent to:
while the second one is equivalent to:
hence the result. □
Appendix 2. Step-by-step Description of the Counter-Examples
In the following tables, Bold values are used to signal reversals. Italic values in Table 3 are used to highlight the value of the objective function (ESSt) for the clustering with 3 clusters.
Appendix 3. Counter-Example of the Monotonicity of \(\bar {I}_{t}\) for Standard HAC in the Euclidean Case
Rights and permissions
About this article
Cite this article
Randriamihamison, N., Vialaneix, N. & Neuvial, P. Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints. J Classif 38, 363–389 (2021). https://doi.org/10.1007/s00357-020-09377-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-020-09377-y