Abstract
Divisive hierarchical clustering algorithms with the diameter criterion proceed by recursively selecting the cluster with largest diameter and partitioning it into two clusters whose largest diameter is smallest possible. We provide two such algorithms with complexitiesO(\(\overline M \) N 2) andO(N 2logN) respectively, where\(\overline M \) denotes the maximum number of clusters in a partition andN the number of entities to be clustered. The former algorithm, an efficient implementation of an algorithm of Hubert, allows to find all partitions into at most\(\overline M \) clusters and is inO(N 2) for fixed\(\overline M \). Moreover, if in each partitioning the size of the largest cluster is bounded byp times the number of entities in the set to be partitioned, with 1/2<=p<1, it provides a complete hierarchy of partitionsO(N 2 logN) time. The latter algorithm, a refinement of an algorithm of Rao allows to build a complete hierarchy of partitions inO(N 2 logN) time without any restriction. Comparative computational experiments with both algorithms and with an agglomerative hierarchical algorithm of Benzécri are reported.
Résumé
Les algorithmes de classification hiérarchique descendante utilisant le critère du diamètre, sélectionnent récursivement la classe de plus grand diamètre et la partitionnent en deux classes, dont le plus grand diamètre est le plus, petit possible. Nous proposons deux tels algorithmes, avec des complexités enO (\(\overline M \)N2) etO(N 2 logN) respectivement, où\(\overline M \) désigne le nombre maximum de classes d'une partition etN le nombre d'objets à classifier. Le premier algorithme, une implantation d'un algorithme de Hubert, permet de construire des partitions avec au plus\(\overline M \) classes et est enO(N 2) pour\(\overline M \) fixé. De plus, si dans chaque bipartition le nombre d'objets de la plus grande classe, est borné parp fois le nombre d'objets de l'ensemble à partitionner, où 1/2≤p<1, cet algorithme permet de construire une hiérarchie complète de partitions en tempsO(N 2 logN). Le second algorithme, un raffinement d'un algorithme de Rao, permet de construire une hiérarchie complète de partitions en tempsO(N 2 logN) sans aucune restriction On présente également des résultats de calcul comparatifs pour les deux algorithmes et pour l'algorithme de classification hiérarchique ascendante de Benzécri.
Similar content being viewed by others
References
AHO, A. V., HOPCROFT, J. F., and ULLMAN, J. D. (1974).The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley.
BENZÉCRI, J. P. (1973),L'Analyse de Données. 1, La Taxinomie, Paris: Dunod.
BENZÉCRI, J. P. (1982), “Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques,”Les Cahiers de l'Analyse des Données, (VII 2, 209–218.
BRUCKER, P. (1978), “On the Complexity of Clustering Problems,” inOptimization and Operations Research, Lecture Notes in Economics and Mathematical Systems, 157, Eds. M. Beckmann and H. Kunzi, Heidelberg: Springer-Verlag, 45–54.
DAY, H. E., and EDELSBRUNNER, H. (1984), “Effcient Algorithrns for Agglomerative Hierarchical Clustering Methods,”Journal of Classification, 1, 7–24.
DEFAYS, D. (1977), “An Efficient Algorithm for a Complete Link Method,”The Computer Journal, 20(4) 364–366.
DELATTRE, M., and HANSEN, P. (1980), “Bicriterion Cluster Analysis,”IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2(4), 277–291.
DIDAY, E., et al. (1979),Optimisation en Classification Automatique, Le Chesnay: INRIA.
DIJKSTRA, E. W. (1959), “A Note on Two Problems in Connexion with Graphs,”Numerische Mathematik, 1, 269–271.
DORING, V., HANSEN, P., and JAUMARD, B. (1990), “Divisive Hierarchical Clustering with the Diameter Criterion: Hubet's Algorithm,”Cahiers du GERAD (forthcoming).
DORING, V., HANSEN, P., and JAUMARD, B. (1990), “Divisive Hierarchical Clustering with the Diameter Criterion: Algorithm Alltrees,”Cahiers du GERAD (forthcoming).
EDWARDS, A. F. S., and CAVALLI-SFORZA, L. L. (1965), “A Method for Cluster Analysis,”Biometrics, 21(2), 362–375.
FISHER, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems,”Annals of Eugenics, VII(II), 179–188.
GAREY, M. R., and JOHNSON, D. S. (1979),Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: Freeman.
GROTSCHEL, M., and WAKABAYASHI, Y. (1989), “A Cutting-plane Algorithm for a Clustering Problem,”Mathematical Programing, 45(1),Series B, 59–96.
GUÉNOCHE, A. (1989), “Partitions with Minimum Diameter,” Conference of the I.F.C.S., Charlottesville.
HANSEN, P., and DELATTRE, M. (1978), “Complete Link Cluster Analysis by Graph Coloring,”Journal of the American Statistical Association, 73, 397–403.
HANSEN, P., and JAUMARD, B. (1987), “Minimum Sum of Diameters Clustering,”Journal of Classification, 4, 215–226.
HARMAN, H. H. (1967),Modern Factor Analysis Chicago: University of Chicago Press.
HUBERT, L. (1973), “Monotone Invariant Clustering Procedures,”Psychometrika, 38(1), 47–62.
JAMBU, M. (1978),Classification automatique pour l'analyse des données, Paris: Dunod (Cluster Analysis and Data Analysis, Amsterdam: North Holland, 1983).
JUAN, J. (1982), “Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisns réciproques,”Les Cahiers de l'Analyse des Données, VII(2), 219–225.
KRUSKAL, J.B. (1956), “On the Shortest Spanning Subtree of a Graph,”Proceedings of the American Mathematical Society, 7, 48–50.
LANCE, G. N., and WILLIAMS, W. T. (1967), “Mixed-data Classificatory Systems: I. Agglomerative Systems,”Australian Computer Journal, 1, 15–20.
LECLERC, B. (1986), “Caractérisation, construction et dénombrement des ultramétriques supérieures minimales,”Statistique et Analyse des Données, 11(2), 26–50.
McNAUGHTON-SMITH, P., WILLIAMS, W. T., DALE, M. B., and MOCKETT, L. G. (1964), “Dissimilarity Analysis,” Nature, 202, 1034–1035.
MONMA, C., and SURI, S. (1989), “Partitioning Points and Graphs to Minimize the Maximum or the Sum of Diameters,”Proceedings of the Sixth International Conference on the Theory and Applications of Graphs, New York: Wiley.
MURTAGH, F. (1983), “A Survey of Recent Advances in Hierarchical Algorithms,”The Computer Journal, 26(4), 354–359.
MURTAGH, F. (1984), “Complexities of Hierarchic Clustering Algorithms: State of the Art,”Computational Statistics Quarterly, 1(2), 101–113.
PRIM, R.C. (1957), “Shortest Connection Networks and Some Generalizations”Bell System Technical Journal, 36, 1389–1401.
RAO, M. R. (1971), “Cluster Analysis and Mathematical Programming,”Journal of the American Statistical Association, 66, 622–626.
VICHI, M. (1985), “On a Flexible and Computationally Feasible Divisive Clustering Technique,”Revista di Statistics, 18(4), 200–208.
WELCH, W.J. (1982), “Algorithmic Complexity: Three NP-Hard Problems in Computational Statistics,”Journal of Statistical Computing and Simulation, 15, 17–25.
WILLIAMS, J.W.J. (1964), “Algorithm 232, Heapsort,”Communications of the Association for Computing Machinery, 7(6), 347–348.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Guénoche, A., Hansen, P. & Jaumard, B. Efficient algorithms for divisive hierarchical clustering with the diameter criterion. Journal of Classification 8, 5–30 (1991). https://doi.org/10.1007/BF02616245
Issue Date:
DOI: https://doi.org/10.1007/BF02616245