Primary Clusters Selection Using Adaptive Algorithms

Ahmadi, Shahrbanoo; Parvin, Hamid; Rad, Farhad

doi:10.1007/978-3-319-27060-9_40

Shahrbanoo Ahmadi¹⁵,
Hamid Parvin¹⁵ &
Farhad Rad¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9413))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1107 Accesses

Abstract

Data clustering means to partition the samples in similar clusters; so that each cluster’s samples have maximum similarity with each other and have a maximum distance from the samples of other clusters. Due to the problem of unsupervised clustering selection of a specific algorithm for clustering a set of unknown data is involved in much risk, and we usually fail to find the best option. Because of the complexity of the issue and inefficacy of basic clustering methods, most studies have been directed toward combined clustering methods. We name output partition of a clustering algorithm as a result. Diversity of the results of an ensemble of basic clusterings is one of the most important factors that can affect the quality of the final result. The quality of those results is another factor that affects the quality of the final result. Both factors considered in recent research of combined clustering. We propose a new framework to improve the efficiency of combined clustering that is based on selection of a subset of primary clusters. Selection of a Proper subset has a crucial role in the performance of our method. The selection is done using intelligent methods. The main ideas of the proposed method for selecting a subset of the clusters are to use the clusters that are stable. This process is done by the intelligent search algorithms. To assess the clusters, stability criteria based on mutual information has been used. At last, the selected clusters are going to be aggregated by some consensus functions. Experimental results on several standard datasets show that the proposed method can effectively improve the complete ensemble method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jain, A., Murty, M.N., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Faceli, K., Marcilio, C.P., Souto, D.: Multi-objective clustering ensemble. In: Proceedings of the Sixth International Conference on Hybrid Intelligent Systems (HIS 2006) (2006)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)
MathSciNet Google Scholar
Melanie, M.: An Introduction to Genetic Algorithms. The MIT Press, Cambridge (1999). (A Bradford Book, Fifth printing)
Google Scholar
Aarts, E.H.L., Korst, J.: Simulated Annealing and Boltzmann Machines. Wiley, Essex (1989)
MATH Google Scholar
Fred, A., Jain, A.K.: Data clustering using evidence accumulation. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR 2002, Quebec City, pp. 276–280 (2002)
Google Scholar
Parvin, H., Alizadeh, H. Minaei-Bidgoli, B.: A new method for constructing classifier ensembles. Int. J. Digit. Content Technol. Appl. JDCTA (2009). ISSN 1975-9339
Google Scholar
Parvin H., Alizadeh, H., Minaei-Bidgoli, B.: Using clustering for generating diversity in classifier ensemble. Int. J. Digit. Content Technol. Appl. JDCTA, 3(1), 51–57 (2009). ISSN 1975-9339
Google Scholar
Alizadeh H., Minaei-Bidgoli, B., Amirgholipour, S.K.: A new method for improving the performance of k nearest neighbor using clustering technique. Int. J. Convergence Inf. Technol. JCIT (2009). ISSN 1975-9320
Google Scholar
Topchy, A., Jain, A.K., Punch, W.F.: Combining multiple weak clusterings. In: Proceedings of the 3d IEEE International Conference on Data Mining, pp. 331–338 (2003)
Google Scholar
Fred, A., Lourenco, A.: Cluster ensemble methods: from single clusterings to combined solutions. In: Studies in Computational Intelligence (SCI), vol. 126, 3–30 (2008)
Google Scholar
Ayad, H.G., Kamel, M.S.: Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 160–173 (2008)
Article Google Scholar
Fred, A.L., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)
Article Google Scholar
Kuncheva ,L.I., Hadjitodorov, S.: Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 1214–1219 (2004)
Google Scholar
Fred, A., Jain, A.K.: Learning pairwise similarity for data clustering. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006) (2006)
Google Scholar
Azimi, J.: (1386), To study the distribution of the compound clustering, thesis, University of Science and Technology, June
Google Scholar
Alizadeh H., (1387), clustering-based combination subset of the initial results, Master Thesis, Department of Computer Engineering, University of Science and Technology, March
Google Scholar
Baumgartner, R., Somorjai, R., Summers, R., Richter, W., Ryner, L., Jarmasz, M.: Resampling as a cluster validation technique in fMRI. J. Magn. Reson. Imaging 11, 228–231 (2000)
Article Google Scholar
Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective data clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 424–430, Washington D.C. (2004)
Google Scholar
Shamiry, O., Tishby, N.: Cluster stability for finite samples. In: 21st Annual Conference on Neural Information Processing Systems (NIPS 2007) (2007)
Google Scholar
Lange, T., Braun, M.L., Roth V., Buhmann, J.M.: Stability-based model selection. In: Advances in Neural Information Processing Systems 15. MIT Press, Cambridge (2003)
Google Scholar
Breckenridge, J.: Replicating cluster analysis: Method, consistency and validity. Multivar. Behav. Res. (1989)
Google Scholar
Fridlyand, J., Dudoit, S.: Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Statistics Berkeley Tech Report no. 600 (2001)
Google Scholar
Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13, 2573–2593 (2001)
Article MATH Google Scholar
Roth, V., Lange T., Braun, M., Buhmann, J.: A resampling approach to cluster validation. In: International Conference on Computational Statistics, COMPSTAT (2002)
Google Scholar
Roth, V., Braun, M.L., Lange, T., Buhmann, J.M.: Stability-based model order selection in clustering with applications to gene expression data. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 607–612. Springer, Heidelberg (2002)
Chapter Google Scholar
Rakhlin, A., Caponnetto, A.: Stability of k-means clustering. In: Advances in Neural Information Processing Systems 19. MIT Press, Cambridge (2007)
Google Scholar
Luxburg, U.V., Ben-David, S.: Towards a statistical theory of clustering. Technical report, PASCAL Workshop on Clustering, London (2005)
Google Scholar
Roth, V., Lange, T.: Feature selection in clustering problems. In: Advances in Neural Information Processing Systems, NIPS 2004 (2004)
Google Scholar
Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)
Article MATH Google Scholar
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pasific Symposium on Biocomputing, vol. 7, pp. 6–17 (2002)
Google Scholar
Estivill-Castro, V., Yang, J.: Cluster Validity Using Support Vector Machines. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 244–256. Springer, Heidelberg (2003)
Chapter Google Scholar
Moller, U., Radke, D.: A cluster validity approach based on nearest-neighbor resampling. In: Proceedings of the 18th International. Conference on Pattern Recognition (ICPR 2006) (2006)
Google Scholar
Brandsma, T., Buishand, T.A.: Simulation of extreme precipitation in the Rhine basin by nearest-neighbour resampling. Hydrol. Earth Syst. Sci. 2, 195–209 (1998)
Article Google Scholar
Inokuchi, R., Nakamura, T., Miyamoto, S.: Kernelized cluster validity measures and application to evaluation of different clustering algorithms. In: Proceedings of the IEEE International Conference on Fuzzy Systems, Canada, 16–21 July 2006
Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 841–846 (1991)
Article Google Scholar
Das, A.K., Sil, J.: Cluster validation using splitting and merging technique. In: Proceedings of International Conference on Computational Intelligence and Multimedia Applications, ICCIMA (2007)
Google Scholar
Fern, X., Lin, W.: Cluster ensemble selection. In: SIAM International Conference on Data Mining (SDM 2008) (2008)
Google Scholar
Brossier, G.: Piecewise hierarchical clustering. J. Classif. 7(2), 197–216 (1990)
Article MATH MathSciNet Google Scholar
Lapointe, F.J., Legendre, P.: The generation of random ultrametric matrices representing dendrograms. J. Classif. 8(2), 177–200 (1991)
Article Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Banfield, C.F.: Ultrametric distances for a single linkage dendrogram. JSTOR: Appl. Stat. Stat. Algorithms 25(3), 313–315 (1976)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Kaufman, L., Rosseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Book Google Scholar
Man, Y., Gath, I.: Detection and separation of ring-shaped clusters using fuzzy clusters. IEEE Trans. Pattern Anal. Mach. Intel. 16(8), 855–861 (1994)
Article Google Scholar
Minaei-Bidgoli, B., Topchy, A., Punch, W.F.: Ensembles of partitions via data resampling. In: Proceedings of International Conference on Information Technology, ITCC 04, Las Vegas
Google Scholar
Alizadeh, H., Amirgholipour, S.K., Seyedaghaee, N.R., Minaei-Bidgoli, B.: Nearest cluster ensemble (NCE): clustering ensemble based approach for improving the performance of k-nearest neighbor algorithm. In: 11th Conference of the International Federation of Classification Societies, IFCS09, 13–18 Mar 2009
Google Scholar
Mohammadi, M., Alizadeh, H., Minaei-Bidgoli, B.: Neural network ensembles using clustering ensemble and genetic algorithm. In: International Conference on Convergence and Hybrid Information Technology, ICCIT08, IEEE CS, 11–13 Nov 2008
Google Scholar
Barthelemy, J.P., Leclerc, B.: The median procedure for partition. In: Cox, I.J. et al. (eds.) Partitioning Data Sets. AMS DIMACS Series in Discrete Mathematics, vol. 19, pp. 3–34 (1995)
Google Scholar
Fern, X. Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning, ICML (2003)
Google Scholar
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)
Article Google Scholar
Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1411–1415 (2003)
Google Scholar
Fred, A., Jain, A.K.: Robust data clustering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, vol. II, pp. 128–136 USA (2003)
Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles. Mach. Learn. (2003)
Google Scholar
Newman, C.B.D.J., Hettich, S., Merz, C.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/˜mlearn/MLSummary.html

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Shahrbanoo Ahmadi, Hamid Parvin & Farhad Rad

Authors

Shahrbanoo Ahmadi
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Parvin
View author publications
You can also search for this author in PubMed Google Scholar
Farhad Rad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Parvin .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov
Facultad de ciencias, Universidad Autónoma Nacional, México, Distrito Federal, Mexico
Sofía N. Galicia-Haro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmadi, S., Parvin, H., Rad, F. (2015). Primary Clusters Selection Using Adaptive Algorithms. In: Sidorov, G., Galicia-Haro, S. (eds) Advances in Artificial Intelligence and Soft Computing. MICAI 2015. Lecture Notes in Computer Science(), vol 9413. Springer, Cham. https://doi.org/10.1007/978-3-319-27060-9_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-27060-9_40
Published: 30 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27059-3
Online ISBN: 978-3-319-27060-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics