Abstract
The intelligent Minkowski and weighted Minkowski K-means are recently developed effective clustering algorithms capable of computing feature weights. Their cluster-specific weights follow the intuitive idea that a feature with a low dispersion in a specific cluster should have a greater weight in this cluster than a feature with a high dispersion. The final clustering provided by these techniques obviously depends on the selection of the Minkowski exponent. The median-based central consensus rule we introduce in this paper allows one to select an optimal value of the Minkowski exponent. Our rule takes into account the values of the Adjusted Rand Index (ARI) between clustering solutions obtained for different Minkowski exponents and selects the clustering that provides the highest average value of ARI. Our simulations, carried out with real and synthetic data, show that the proposed median-based consensus procedure usually outperforms clustering strategies based on the selection of the highest value of the Silhouette or Calinski–Harabasz cluster validity indices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46, 243–256 (2012)
Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12, 153–155 (1967)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3, 1–27 (1974)
Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn. 37, 943–952 (2004)
de Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in K-means clustering. Pattern Recogn. 45, 1061–1075 (2012)
Field, A.: Discovering Statistics Using SPSS. SAGE Publications, New Delhi (2005)
Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27, 657–668 (2005)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Jain, A.K.: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120, 590–596 (2013)
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open Source Scientific Tools for Python. R Foundation for Statistical Computing, Vienna (2011). Available via DIALOG
Lichman, M.: UCI Machine Learning Repository. School of Information and Computer Sciences, University of California, Irvine (2013). Available via DIALOG
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Broy, M. (ed.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley, CA (1967)
Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and K-Means partitioning. J. Classif. 169, 245–271 (2001)
MATLAB: MATLAB:2010. The MathWorks Inc., Natick, MA (2010)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Mirkin, B.: Clustering: A Data Recovery Approach. CRC Press, London (2012)
Murtagh, F.: Complexities of hierarchic clustering algorithms: state of the art. Comput. Stat. 1, 101–113 (1984)
Murtagh, F., Contreras, P.: Methods of hierarchical clustering (2011). arXiv preprint arXiv:1105.0121
Pal, S.K., Majumder, D.D.: Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Trans. Syst. Man Cyber. 7, 625–629 (1977)
Pollard, K.S., Van Der Laan, M.J.: A method to identify significant clusters in gene expression data. Bepress, pp. 318–325 (2002)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2013). Available via DIALOG
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Steinley, D.: K-means: a half-century synthesis. Br. J. Math. Stat. Psychol. 59, 1–34 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
de Amorim, R.C., Tahiri, N., Mirkin, B., Makarenkov, V. (2017). A Median-Based Consensus Rule for Distance Exponent Selection in the Framework of Intelligent and Weighted Minkowski Clustering. In: Palumbo, F., Montanari, A., Vichi, M. (eds) Data Science . Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-55723-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-55723-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55722-9
Online ISBN: 978-3-319-55723-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)