Favoring the k-Means Algorithm with Initialization Methods

de Oliveira, Anderson Francisco; do Carmo Nicoletti, Maria

doi:10.1007/978-3-030-16657-1_3

Anderson Francisco de Oliveira¹⁸ &
Maria do Carmo Nicoletti^18,19

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 940))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

1428 Accesses

Abstract

Clustering algorithms are non-supervised algorithms and, among the many available, the k-Means can be considered one of the most popular and successful. The performance of the k-Means, however, is highly dependent on a ‘good’ initialization of the k group centers (centroids) as well as of the value assigned to the number (k) of groups the final clustering should have. This chapter addresses experiments using five initialization algorithms available in the literature namely, the Method1, the k-Means++, the CCIA, the Maedeh&Suresh and the SPSS algorithms, to empirically evaluate their contribution to improving k-Means performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

MacQueen, J.B.: Some methods for classification and analysis of multivariate observations, In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1987)
Google Scholar
Al-Daoud, M., Roberts, S.A.: New methods for the initialisation of clusters. Pattern Recogn. Lett. 17, 451–455 (1996)
Article Google Scholar
Arthur, D., Vassilvitskii, S.: K-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)
Google Scholar
Maedeh, A., Suresh, K.: Design of efficient k-Means clustering algorithm with improved initial centroids. Int. J. Eng. Technol. 5(1), 33–38 (2013)
Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-Means clustering. Pattern Recogn. Lett. 25, 1293–1302 (2004)
Article Google Scholar
Pavan, K.K., Rao, A.A., Rao, A.V.D., Sridhar, G.R.: Robust seed selection algorithm for k-means type algorithms. Int. J. Comput. Sci. Inform. Technol. (IJCSIT) 3(5), 147–163 (2011)
Google Scholar
Aggarwal, C.C., Reddy, C.K.: Data Clustering Algorithms and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton (2013)
Book Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining – Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers, Amsterdam (2012)
Google Scholar
Burks, S., Harrell, G., Wang, J.: On initial effects of the k-Means clustering, In: Proceedings of the 2015 World Congress in Computer Science, Computer Engineering, & Applied Computing, USA, pp. 200–205 (2015)
Google Scholar
Dua, D., Karra Taniskidou, E.: UCI Machine Learning Repository (http://archive.ics.edu/ml). University of California, School of Information and Computer Science, Irvine, CA (2017)
Chernoff, H.: The use of faces to represent points in n-dimensional space graphically, Technical report no. 71, Department of Statistics. Stanford University, Stanford, CA, USA (1971)
Google Scholar
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J., Ostrowski, E.: Handbook of Small Data Sets, 1st edn. Chapman and Hall/CRC, London (1993)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discovery Data 1(1) (2007). https://doi.org/10.1145/1217299.1217303, http://doi.acm.org/10.1145/1217299.1217303, Article 4, 30 pages
Su, M.C., Chou, C.H., Hsieh, C.C.: Fuzzy C-Means algorithm with a point symmetry distance. Int. J. Fuzzy Syst. 7(4), 175–181 (2005)
Google Scholar
Rousseeuw, P.: Silhouettes: a graphical-aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar

Download references

Acknowledgments

The authors thank UNIFACCAMP and CNPq for their support. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior − Brasil (CAPES) − Finance Code 001.

Author information

Authors and Affiliations

Centro Universitário C. Limpo Paulista (UNIFACCAMP), Campo Limpo Paulista, SP, Brazil
Anderson Francisco de Oliveira & Maria do Carmo Nicoletti
Universidade Federal de S. Carlos (UFSCar), São Carlos, SP, Brazil
Maria do Carmo Nicoletti

Authors

Anderson Francisco de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Maria do Carmo Nicoletti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria do Carmo Nicoletti .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, Auburn, WA, USA
Ajith Abraham
School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
Aswani Kumar Cherukuri
Tijuana Institute of Technology, Tijuana, Mexico
Patricia Melin
Machine Intelligence Research Labs, Auburn, WA, USA
Niketa Gandhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Oliveira, A.F., do Carmo Nicoletti, M. (2020). Favoring the k-Means Algorithm with Initialization Methods. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-16657-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-16657-1_3
Published: 12 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16656-4
Online ISBN: 978-3-030-16657-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics