Abstract
Distributed machine learning is a problem of inferring a desired relation when the training data is distributed throughout a network of agents (e.g. sensor networks, robot swarms, etc.). A typical problem of unsupervised learning is clustering, that is grouping patterns based on some similarity/dissimilarity measures. Provided they are highly scalable, fault-tolerant and energy efficient, clustering algorithms can be adopted in large-scale distributed systems. This work surveys the state-of-the-art in this field, presenting algorithms that solve the distributed clustering problem efficiently, with particular attention to the computation and clustering criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balcan, M., Ehrlich, S., Liang, Y.: Distributed k-means and k-median clustering on general topologies. Adv. Neural Inf. Process. Syst. 26, 1995–2003 (2013)
Charalambous, C., Cui, S.: A bio-inspired distributed clustering algorithm for wireless sensor networks. In: Proceedings of the 4th Annual Int. Conf. on Wireless Internet (WICON’08) (2008)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD), vol. 96, pp. 226–231 (1996)
Eyal, I., Keidar, I., Rom, R.: Distributed data clustering in sensor networks. Distrib. Comput. 24(5), 207–222 (2010)
Forero, P., Cano, A., Giannakis, G.: Consensus-based distributed expectation-maximization algorithm for density estimation and classification using wireless sensor networks. In: Proceedings of ICASSP. pp. 1989–1992 (2008)
Forestier, G., Gançarski, P., Wemmert, C.: Collaborative clustering with background knowledge. Data Knowl. Eng. Arch. 69(2), 211–228 (2010)
Gançarski, P.: Remote sensing image interpretation. http://omiv2.u-strasbg.fr/imagemining/documents/IMAGEMINING-Gancarski-Multistrategy.pdf. Accessed 04 April 2015
Ghahramani, Z.: Unsupervised Learning. In: Lecture Notes in Computer Science, vol. 3176, pp. 72–112. Springer (2004)
Ghanem, S., Kechadi, T., Tari, A.: New approach for distributed clustering. In: Proceedings of IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM). pp. 60–65 (2011)
Gu, D.: Distributed EM algorithm for Gaussian mixtures in sensor networks. IEEE Trans. Neural Netw. 19(7), 1154–1166 (2008)
Hartigan, J., Wong, M.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Series C (Appl. Stat.) 28(1), 100–108 (1979)
Hore, P., Hall, L., Goldgof, D.: A scalable framework for cluster ensembles. Pattern Recognit. 42(5), 676–688 (2009)
Januzaj, E., Kriegel, H., Pfeifle, M.: Towards effective and efficient distributed clustering. In: Proceedings of Workshop on Clustering Large Data Sets (ICDM). pp. 49–58 (2003)
Januzaj, E., Kriegel, H., Pfeifle, M.: Scalable density-based distributed clustering. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). pp. 231–244 (2004)
Kantabutra, S., Couch, A.: Parallel k-means clustering algorithm on NOWs. MedTec Tech. J. 1(6), 243–248 (2000)
Khac, N., Aouad, L., Kechadi, T.: A new approach for distributed density based clustering on grid platform. In: Lecture Notes in Computer Science, vol. 4587, pp. 247–258. Springer (2007)
Klusch, M., Lodi, S., Moro, G.: Distributed clustering based on sampling local density estimates. In: Proceedings of the Int. Joint Conference on Artificial Intelligence (IJCAI’03). pp. 485–490 (2003)
Laird, N., Dempster, A.P., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Series B 39(1), 1–38 (1977)
Li, L., Tang, J., Ge, B.: K-DmeansWM: an effective distributed clustering algorithm based on P2P. Comput. Sci. 37(1), 39–41 (2010)
Liang, Y., Balcan, M., Kanchanapally, V.: Distributed PCA and k-means clustering. In: The Big Learning Workshop at NIPS (2013)
Mimaroglu, S., Erdil, E.: Combining multiple clusterings using similarity graph. Pattern Recognit. 44, 694–703 (2011)
Nguyen, N., Caruana, R.: Consensus clustering. In: Proceedings of IEEE International Conference on Data Mining. pp. 607–612 (2006)
Ni, W., Chen, G., Wu, Y.: Local density based distributed clustering algorithm. J. Softw. pp. 2339–2348 (2008)
Nowak, R.: Distributed EM algorithms for density estimation and clustering in sensor networks. IEEE Trans. Signal Process. 51(8), 2245–2253 (2003)
Pan, X., Gonzalez, J., Jegelka, S., Broderick, T., Jordan, M.: Optimistic concurrency control for distributed unsupervised learning. In: Proceedings of 27th Annual Conference on Neural Information Processing Systems. pp. 1403–1411 (2013)
Panella, M.: A hierarchical procedure for the synthesis of ANFIS networks. Adv. Fuzzy Syst. 2012, 1–12 (2012)
Panella, M., Rizzi, A., Martinelli, G.: Refining accuracy of environmental data prediction by MoG neural networks. Neurocomputing 55(3–4), 521–549 (2003)
Parisi, R., Cirillo, A., Panella, M., Uncini, A.: Source localization in reverberant environments by consistent peak selection. In: Proceedings of ICASSP. vol. 1, pp. I–37–I–40 (2007)
Rahmi, S., Zargham, M., Thakre, A., Chhillar, D.: A parallel fuzzy c-mean algorithm for image segmentation. In: Proceedings of NAFIPS’04. vol. 1, pp. 234–237 (2004)
Silva-Pereira, S., Pages-Zamora, A., Lopez-Valcarce, R.: A diffusion-based distributed EM algorithm for density estimation in wireless sensor networks. In: Proceedings of ICASSP. pp. 4449–4453 (2013)
Strehl, A., Ghosh, J.: Cluster ensembles a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Tasoulis, D., Vrahatis, M.: Unsupervised distributed clustering. Parallel Distrib. Comput. Netw. pp. 347–351 (2004)
Towfic, Z., Chen, J., Sayed, A.: Collaborative learning of mixture models using diffusion adaptation. In: Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP). pp. 1–6 (2011)
Vendramin, L.: Estudo e desenvolvimento de algoritmos para agrupamento fuzzy de dados em cenarios centralizados e distribuidos. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-10092012-163429/publico/LucasVendramin.pdf. Accessed 04 April 2015
Vendramin, L., Campello, R., Coletta, L., Hruschka, E.: Distributed fuzzy clustering with automatic detection of the number of clusters. In: Proceedings of International Symposium on Distributed Computing and Artificial Intelligence, Advances in Intelligent and Soft Computing. vol. 91, pp. 133–140 (2011)
Wang, H., Li, Z., Cheng, Y.: Distributed and parallelled EM algorithm for distributed cluster ensemble. In: Proceedings of Pacific-Asia Workshop on Computational Intelligence and Industrial Application (PACIIA’08). vol. 2, pp. 3–8 (2008)
Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles, machine learning and knowledge discovery. In: Lecture Notes in Computer Science, vol. 6323, pp. 435–450. Springer (2009)
Wemmert, C., Gançarski, P., Korczak, J.: A collaborative approach to combine multiple learning methods. Int. J. Artif. Intell. Tools 9(1), 59–78 (2000)
Xu, X., Jager, J., Kriegel, H.: A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov. 3(3), 263–290 (1999)
Younis, O., Fahmy, S.: Distributed clustering in ad-hoc sensor networks: a hybrid, energy-efficient approach. IEEE Trans. Mob. Comput. 3(4), 366–379 (2004)
Zhen, M., Ji, G.: DK-means, an improved distributed clustering algorithm. J. Comput. Res. Dev. 44(2), 84–88 (2007)
Zhou, J., Chen, C.P., Chen, L., Li, H.X.: A collaborative fuzzy clustering algorithm in distributed network environments. IEEE Trans. Fuzzy Syst. 22(6), 1443–1456 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Rosato, A., Altilio, R., Panella, M. (2016). Recent Advances on Distributed Unsupervised Learning. In: Bassis, S., Esposito, A., Morabito, F., Pasero, E. (eds) Advances in Neural Networks. WIRN 2015. Smart Innovation, Systems and Technologies, vol 54. Springer, Cham. https://doi.org/10.1007/978-3-319-33747-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-33747-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33746-3
Online ISBN: 978-3-319-33747-0
eBook Packages: EngineeringEngineering (R0)