A Median-Based Consensus Rule for Distance Exponent Selection in the Framework of Intelligent and Weighted Minkowski Clustering

de Amorim, Renato Cordeiro; Tahiri, Nadia; Mirkin, Boris; Makarenkov, Vladimir

doi:10.1007/978-3-319-55723-6_8

Renato Cordeiro de Amorim²¹,
Nadia Tahiri²²,
Boris Mirkin^23,24 &
…
Vladimir Makarenkov²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3476 Accesses
1 Altmetric

Abstract

The intelligent Minkowski and weighted Minkowski K-means are recently developed effective clustering algorithms capable of computing feature weights. Their cluster-specific weights follow the intuitive idea that a feature with a low dispersion in a specific cluster should have a greater weight in this cluster than a feature with a high dispersion. The final clustering provided by these techniques obviously depends on the selection of the Minkowski exponent. The median-based central consensus rule we introduce in this paper allows one to select an optimal value of the Minkowski exponent. Our rule takes into account the values of the Adjusted Rand Index (ARI) between clustering solutions obtained for different Minkowski exponents and selects the clustering that provides the highest average value of ARI. Our simulations, carried out with real and synthetic data, show that the proposed median-based consensus procedure usually outperforms clustering strategies based on the selection of the highest value of the Silhouette or Calinski–Harabasz cluster validity indices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46, 243–256 (2012)
Article Google Scholar
Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12, 153–155 (1967)
Article Google Scholar
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3, 1–27 (1974)
Article MathSciNet MATH Google Scholar
Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn. 37, 943–952 (2004)
Article MATH Google Scholar
de Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in K-means clustering. Pattern Recogn. 45, 1061–1075 (2012)
Article Google Scholar
Field, A.: Discovering Statistics Using SPSS. SAGE Publications, New Delhi (2005)
MATH Google Scholar
Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27, 657–668 (2005)
Article Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article MATH Google Scholar
Jain, A.K.: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Article Google Scholar
Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120, 590–596 (2013)
Article Google Scholar
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open Source Scientific Tools for Python. R Foundation for Statistical Computing, Vienna (2011). Available via DIALOG
Google Scholar
Lichman, M.: UCI Machine Learning Repository. School of Information and Computer Sciences, University of California, Irvine (2013). Available via DIALOG
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Broy, M. (ed.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley, CA (1967)
Google Scholar
Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and K-Means partitioning. J. Classif. 169, 245–271 (2001)
MathSciNet MATH Google Scholar
MATLAB: MATLAB:2010. The MathWorks Inc., Natick, MA (2010)
Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Article Google Scholar
Mirkin, B.: Clustering: A Data Recovery Approach. CRC Press, London (2012)
Book MATH Google Scholar
Murtagh, F.: Complexities of hierarchic clustering algorithms: state of the art. Comput. Stat. 1, 101–113 (1984)
MATH Google Scholar
Murtagh, F., Contreras, P.: Methods of hierarchical clustering (2011). arXiv preprint arXiv:1105.0121
Google Scholar
Pal, S.K., Majumder, D.D.: Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Trans. Syst. Man Cyber. 7, 625–629 (1977)
Article MATH Google Scholar
Pollard, K.S., Van Der Laan, M.J.: A method to identify significant clusters in gene expression data. Bepress, pp. 318–325 (2002)
Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2013). Available via DIALOG
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Steinley, D.: K-means: a half-century synthesis. Br. J. Math. Stat. Psychol. 59, 1–34 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Hertfordshire, College Lane, Hatfield, AL10 9AB, UK
Renato Cordeiro de Amorim
Département d’informatique, Université du Québec à Montréal, C.P. 8888 succ. Centre-Ville, Montreal, QC, Canada, H3C 3P8
Nadia Tahiri & Vladimir Makarenkov
Department of Data Analysis and Machine Intelligence, National Research University, Higher School of Economics, Moscow, Russia
Boris Mirkin
Department of Computer Science and Information Systems, Birkbeck University of London, Malet Street, London, WC1E 7HX, UK
Boris Mirkin

Authors

Renato Cordeiro de Amorim
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Tahiri
View author publications
You can also search for this author in PubMed Google Scholar
Boris Mirkin
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Makarenkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Makarenkov .

Editor information

Editors and Affiliations

Department of Political Sciences, University of Naples Federico II, Napoli, Italy
Francesco Palumbo
Department of Statistical Sciences Paolo Fortunati, Alma Mater Studiorum, University of Bologna, Bologna, Italy
Angela Montanari
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Amorim, R.C., Tahiri, N., Mirkin, B., Makarenkov, V. (2017). A Median-Based Consensus Rule for Distance Exponent Selection in the Framework of Intelligent and Weighted Minkowski Clustering. In: Palumbo, F., Montanari, A., Vichi, M. (eds) Data Science . Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-55723-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-55723-6_8
Published: 05 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55722-9
Online ISBN: 978-3-319-55723-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics