Abstract
K-means algorithm is one of the most widely used methods in data mining and statistical data analysis to partition several objects in K distinct groups, called clusters, on the basis of their similarities. The main problem of this algorithm is that it requires the number of clusters as an input data, but in the real life it is very difficult to fix in advance such value. For such reason, several modified K-means algorithms are proposed where the number of clusters is defined at run time, increasing it in a iterative procedure until a given cluster quality metric is satisfied. In order to face the high computational cost of this approach we propose an adaptive procedure, where at each iteration two new clusters are created, splitting only the one with the worst value of the quality metric.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abubaker, M., Ashour, W.M.: Efficient data clustering algorithms: improvements over Kmeans. Int. J. Intell. Syst. Appl. 5, 37–49 (2013)
Aggarwal, C.C., Reddy, C.K.: Data Clustering, Algorithms and Applications. Chapman and Hall/CRC, London (2013)
Caruso, P., Laccetti, G., Lapegna, M.: A performance contract system in a grid enabling, component based programming environment. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 982–992. Springer, Heidelberg (2005). https://doi.org/10.1007/11508380_100
D’Ambra, P., Danelutto, M., diSerafino, D., Lapegna, M.: Advanced environments for parallel and distributed applications: a view of the current status. Parallel Comput. 28, 1637–1662 (2002)
D’Ambra, P., Danelutto, M., diSerafino, D., Lapegna, M.: Integrating MPI-based numerical software into an advanced parallel computing environment. In: Proceedings of the Eleventh Euromicro Conference on Parallel Distributed and Network-based Processing, Clematis ed., pp. 283–291. IEEE (2003)
D’Amore, L., Mele, V., Laccetti, G., Murli, A.: Mathematical approach to the performance evaluation of matrix multiply algorithm. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9574, pp. 25–34. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32152-3_3
D’Apuzzo, M., Lapegna, M., Murli, A.: Scalability and load balancing in adaptive algorithms for multidimensional integration. Parallel Comput. 23, 1199–1210 (1997)
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2017). http://archive.ics.uci.edu/ml
Duda, R., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Hoboken (1973). (Q327.D83)
Frey, P.W., Slate, D.J.: Letter recognition using holland-style adaptive classifiers. Mach. Learn. 6, 161–182 (1991)
Gan, D.G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2007)
Gregoretti, F., Laccetti, G., Murli, A., Oliva, G., Scafuri, U.: MGF: a grid-enabled MPI library. Future Gener. Comput. Syst. 24, 158–165 (2008)
Huang, Z.X.: Extensions to the K-means algorithm for clustering large datasets with categorical values. Data Min. Knowl. Disc. 2, 283–304 (1998)
Joshi, A., Kaur, R.: A review: comparative study of various clustering techniques in data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 55–57 (2013)
Laccetti, G., Lapegna, M., Mele, V., Montella, R.: An adaptive algorithm for high-dimensional integrals on heterogeneous CPU-GPU systems. Concurr. Comput. Pract. Exp. 31, cpe4945 (2018)
Laccetti, G., Lapegna, M., Mele, V., Romano, D., Murli, A.: A double adaptive algorithm for multidimensional integration on multicore based HPC Systems. Int. J. Parallel Program. 40, 397–409 (2012)
Laccetti, G., Lapegna, M., Mele, V., Romano, D.: A study on adaptive algorithms for numerical quadrature on heterogeneous GPU and multicore based systems. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013. LNCS, vol. 8384, pp. 704–713. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55224-3_66
Laccetti, G., Lapegna, M., Mele, V.: A loosely coordinated model for heap-based priority queues in multicore environments. Int. J. Parallel Prog. 44, 901–921 (2016)
Lapegna, M.: A global adaptive quadrature for the approximate computation of multidimensional integrals on a distributed memory multiprocessor. Concurr. Pract. Exp. 4, 413–426 (1992)
Patibandla, R.S.M.L., Veeranjaneyulu, N.: Survey on clustering algorithms for unstructured data. In: Bhateja, V., Coello Coello, C.A., Satapathy, S.C., Pattnaik, P.K. (eds.) Intelligent Engineering Informatics. AISC, vol. 695, pp. 421–429. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7566-7_41
Marcellino, L., et al.: Using GPGPU accelerated interpolation algorithms for marine bathymetry processing with on-premises and cloud based computational resources. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10778, pp. 14–24. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78054-2_2
Montella, R., et al.: Accelerating linux and android applications on low-power devices through remote GPGPU offloading. Concurr. Comput.: Pract. Exp. 29, cpe.4950 (2017)
Pelleg, D., Moore. A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, pp. 727–734. Morgan Kaufmann (2000)
Pena, J.M., Lozano, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognit. Lett. 20, 1027–1040 (1999)
Shindler, M., Wong, A., Meyerson, A.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Proceedings of 25th Annual Conference on Neural Information Processing Systems, pp. 2375–2383 (2011)
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015). https://doi.org/10.1007/s40745-015-0040-1
Xu, R., Wunsch, D.: Survey of clustering algorithms. Trans. Neural Netw. 16, 645–678 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lapegna, M., Mele, V., Romano, D. (2020). An Adaptive Strategy for Dynamic Data Clustering with the K-Means Algorithm. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12044. Springer, Cham. https://doi.org/10.1007/978-3-030-43222-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-43222-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43221-8
Online ISBN: 978-3-030-43222-5
eBook Packages: Computer ScienceComputer Science (R0)