Abstract
Data clustering means to partition the samples in similar clusters; so that each cluster’s samples have maximum similarity with each other and have a maximum distance from the samples of other clusters. Due to the problem of unsupervised clustering selection of a specific algorithm for clustering a set of unknown data is involved in much risk, and we usually fail to find the best option. Because of the complexity of the issue and inefficacy of basic clustering methods, most studies have been directed toward combined clustering methods. We name output partition of a clustering algorithm as a result. Diversity of the results of an ensemble of basic clusterings is one of the most important factors that can affect the quality of the final result. The quality of those results is another factor that affects the quality of the final result. Both factors considered in recent research of combined clustering. We propose a new framework to improve the efficiency of combined clustering that is based on selection of a subset of primary clusters. Selection of a Proper subset has a crucial role in the performance of our method. The selection is done using intelligent methods. The main ideas of the proposed method for selecting a subset of the clusters are to use the clusters that are stable. This process is done by the intelligent search algorithms. To assess the clusters, stability criteria based on mutual information has been used. At last, the selected clusters are going to be aggregated by some consensus functions. Experimental results on several standard datasets show that the proposed method can effectively improve the complete ensemble method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jain, A., Murty, M.N., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Faceli, K., Marcilio, C.P., Souto, D.: Multi-objective clustering ensemble. In: Proceedings of the Sixth International Conference on Hybrid Intelligent Systems (HIS 2006) (2006)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)
Melanie, M.: An Introduction to Genetic Algorithms. The MIT Press, Cambridge (1999). (A Bradford Book, Fifth printing)
Aarts, E.H.L., Korst, J.: Simulated Annealing and Boltzmann Machines. Wiley, Essex (1989)
Fred, A., Jain, A.K.: Data clustering using evidence accumulation. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR 2002, Quebec City, pp. 276–280 (2002)
Parvin, H., Alizadeh, H. Minaei-Bidgoli, B.: A new method for constructing classifier ensembles. Int. J. Digit. Content Technol. Appl. JDCTA (2009). ISSN 1975-9339
Parvin H., Alizadeh, H., Minaei-Bidgoli, B.: Using clustering for generating diversity in classifier ensemble. Int. J. Digit. Content Technol. Appl. JDCTA, 3(1), 51–57 (2009). ISSN 1975-9339
Alizadeh H., Minaei-Bidgoli, B., Amirgholipour, S.K.: A new method for improving the performance of k nearest neighbor using clustering technique. Int. J. Convergence Inf. Technol. JCIT (2009). ISSN 1975-9320
Topchy, A., Jain, A.K., Punch, W.F.: Combining multiple weak clusterings. In: Proceedings of the 3d IEEE International Conference on Data Mining, pp. 331–338 (2003)
Fred, A., Lourenco, A.: Cluster ensemble methods: from single clusterings to combined solutions. In: Studies in Computational Intelligence (SCI), vol. 126, 3–30 (2008)
Ayad, H.G., Kamel, M.S.: Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 160–173 (2008)
Fred, A.L., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)
Kuncheva ,L.I., Hadjitodorov, S.: Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 1214–1219 (2004)
Fred, A., Jain, A.K.: Learning pairwise similarity for data clustering. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006) (2006)
Azimi, J.: (1386), To study the distribution of the compound clustering, thesis, University of Science and Technology, June
Alizadeh H., (1387), clustering-based combination subset of the initial results, Master Thesis, Department of Computer Engineering, University of Science and Technology, March
Baumgartner, R., Somorjai, R., Summers, R., Richter, W., Ryner, L., Jarmasz, M.: Resampling as a cluster validation technique in fMRI. J. Magn. Reson. Imaging 11, 228–231 (2000)
Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective data clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 424–430, Washington D.C. (2004)
Shamiry, O., Tishby, N.: Cluster stability for finite samples. In: 21st Annual Conference on Neural Information Processing Systems (NIPS 2007) (2007)
Lange, T., Braun, M.L., Roth V., Buhmann, J.M.: Stability-based model selection. In: Advances in Neural Information Processing Systems 15. MIT Press, Cambridge (2003)
Breckenridge, J.: Replicating cluster analysis: Method, consistency and validity. Multivar. Behav. Res. (1989)
Fridlyand, J., Dudoit, S.: Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Statistics Berkeley Tech Report no. 600 (2001)
Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13, 2573–2593 (2001)
Roth, V., Lange T., Braun, M., Buhmann, J.: A resampling approach to cluster validation. In: International Conference on Computational Statistics, COMPSTAT (2002)
Roth, V., Braun, M.L., Lange, T., Buhmann, J.M.: Stability-based model order selection in clustering with applications to gene expression data. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 607–612. Springer, Heidelberg (2002)
Rakhlin, A., Caponnetto, A.: Stability of k-means clustering. In: Advances in Neural Information Processing Systems 19. MIT Press, Cambridge (2007)
Luxburg, U.V., Ben-David, S.: Towards a statistical theory of clustering. Technical report, PASCAL Workshop on Clustering, London (2005)
Roth, V., Lange, T.: Feature selection in clustering problems. In: Advances in Neural Information Processing Systems, NIPS 2004 (2004)
Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pasific Symposium on Biocomputing, vol. 7, pp. 6–17 (2002)
Estivill-Castro, V., Yang, J.: Cluster Validity Using Support Vector Machines. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 244–256. Springer, Heidelberg (2003)
Moller, U., Radke, D.: A cluster validity approach based on nearest-neighbor resampling. In: Proceedings of the 18th International. Conference on Pattern Recognition (ICPR 2006) (2006)
Brandsma, T., Buishand, T.A.: Simulation of extreme precipitation in the Rhine basin by nearest-neighbour resampling. Hydrol. Earth Syst. Sci. 2, 195–209 (1998)
Inokuchi, R., Nakamura, T., Miyamoto, S.: Kernelized cluster validity measures and application to evaluation of different clustering algorithms. In: Proceedings of the IEEE International Conference on Fuzzy Systems, Canada, 16–21 July 2006
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 841–846 (1991)
Das, A.K., Sil, J.: Cluster validation using splitting and merging technique. In: Proceedings of International Conference on Computational Intelligence and Multimedia Applications, ICCIMA (2007)
Fern, X., Lin, W.: Cluster ensemble selection. In: SIAM International Conference on Data Mining (SDM 2008) (2008)
Brossier, G.: Piecewise hierarchical clustering. J. Classif. 7(2), 197–216 (1990)
Lapointe, F.J., Legendre, P.: The generation of random ultrametric matrices representing dendrograms. J. Classif. 8(2), 177–200 (1991)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Banfield, C.F.: Ultrametric distances for a single linkage dendrogram. JSTOR: Appl. Stat. Stat. Algorithms 25(3), 313–315 (1976)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Kaufman, L., Rosseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Man, Y., Gath, I.: Detection and separation of ring-shaped clusters using fuzzy clusters. IEEE Trans. Pattern Anal. Mach. Intel. 16(8), 855–861 (1994)
Minaei-Bidgoli, B., Topchy, A., Punch, W.F.: Ensembles of partitions via data resampling. In: Proceedings of International Conference on Information Technology, ITCC 04, Las Vegas
Alizadeh, H., Amirgholipour, S.K., Seyedaghaee, N.R., Minaei-Bidgoli, B.: Nearest cluster ensemble (NCE): clustering ensemble based approach for improving the performance of k-nearest neighbor algorithm. In: 11th Conference of the International Federation of Classification Societies, IFCS09, 13–18 Mar 2009
Mohammadi, M., Alizadeh, H., Minaei-Bidgoli, B.: Neural network ensembles using clustering ensemble and genetic algorithm. In: International Conference on Convergence and Hybrid Information Technology, ICCIT08, IEEE CS, 11–13 Nov 2008
Barthelemy, J.P., Leclerc, B.: The median procedure for partition. In: Cox, I.J. et al. (eds.) Partitioning Data Sets. AMS DIMACS Series in Discrete Mathematics, vol. 19, pp. 3–34 (1995)
Fern, X. Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning, ICML (2003)
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)
Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1411–1415 (2003)
Fred, A., Jain, A.K.: Robust data clustering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, vol. II, pp. 128–136 USA (2003)
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles. Mach. Learn. (2003)
Newman, C.B.D.J., Hettich, S., Merz, C.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/Ëœmlearn/MLSummary.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ahmadi, S., Parvin, H., Rad, F. (2015). Primary Clusters Selection Using Adaptive Algorithms. In: Sidorov, G., Galicia-Haro, S. (eds) Advances in Artificial Intelligence and Soft Computing. MICAI 2015. Lecture Notes in Computer Science(), vol 9413. Springer, Cham. https://doi.org/10.1007/978-3-319-27060-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-27060-9_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27059-3
Online ISBN: 978-3-319-27060-9
eBook Packages: Computer ScienceComputer Science (R0)