Skip to main content

Primary Clusters Selection Using Adaptive Algorithms

  • Conference paper
  • First Online:
Advances in Artificial Intelligence and Soft Computing (MICAI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9413))

Included in the following conference series:

  • 1107 Accesses

Abstract

Data clustering means to partition the samples in similar clusters; so that each cluster’s samples have maximum similarity with each other and have a maximum distance from the samples of other clusters. Due to the problem of unsupervised clustering selection of a specific algorithm for clustering a set of unknown data is involved in much risk, and we usually fail to find the best option. Because of the complexity of the issue and inefficacy of basic clustering methods, most studies have been directed toward combined clustering methods. We name output partition of a clustering algorithm as a result. Diversity of the results of an ensemble of basic clusterings is one of the most important factors that can affect the quality of the final result. The quality of those results is another factor that affects the quality of the final result. Both factors considered in recent research of combined clustering. We propose a new framework to improve the efficiency of combined clustering that is based on selection of a subset of primary clusters. Selection of a Proper subset has a crucial role in the performance of our method. The selection is done using intelligent methods. The main ideas of the proposed method for selecting a subset of the clusters are to use the clusters that are stable. This process is done by the intelligent search algorithms. To assess the clusters, stability criteria based on mutual information has been used. At last, the selected clusters are going to be aggregated by some consensus functions. Experimental results on several standard datasets show that the proposed method can effectively improve the complete ensemble method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jain, A., Murty, M.N., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  2. Faceli, K., Marcilio, C.P., Souto, D.: Multi-objective clustering ensemble. In: Proceedings of the Sixth International Conference on Hybrid Intelligent Systems (HIS 2006) (2006)

    Google Scholar 

  3. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)

    MathSciNet  Google Scholar 

  4. Melanie, M.: An Introduction to Genetic Algorithms. The MIT Press, Cambridge (1999). (A Bradford Book, Fifth printing)

    Google Scholar 

  5. Aarts, E.H.L., Korst, J.: Simulated Annealing and Boltzmann Machines. Wiley, Essex (1989)

    MATH  Google Scholar 

  6. Fred, A., Jain, A.K.: Data clustering using evidence accumulation. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR 2002, Quebec City, pp. 276–280 (2002)

    Google Scholar 

  7. Parvin, H., Alizadeh, H. Minaei-Bidgoli, B.: A new method for constructing classifier ensembles. Int. J. Digit. Content Technol. Appl. JDCTA (2009). ISSN 1975-9339

    Google Scholar 

  8. Parvin H., Alizadeh, H., Minaei-Bidgoli, B.: Using clustering for generating diversity in classifier ensemble. Int. J. Digit. Content Technol. Appl. JDCTA, 3(1), 51–57 (2009). ISSN 1975-9339

    Google Scholar 

  9. Alizadeh H., Minaei-Bidgoli, B., Amirgholipour, S.K.: A new method for improving the performance of k nearest neighbor using clustering technique. Int. J. Convergence Inf. Technol. JCIT (2009). ISSN 1975-9320

    Google Scholar 

  10. Topchy, A., Jain, A.K., Punch, W.F.: Combining multiple weak clusterings. In: Proceedings of the 3d IEEE International Conference on Data Mining, pp. 331–338 (2003)

    Google Scholar 

  11. Fred, A., Lourenco, A.: Cluster ensemble methods: from single clusterings to combined solutions. In: Studies in Computational Intelligence (SCI), vol. 126, 3–30 (2008)

    Google Scholar 

  12. Ayad, H.G., Kamel, M.S.: Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 160–173 (2008)

    Article  Google Scholar 

  13. Fred, A.L., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)

    Article  Google Scholar 

  14. Kuncheva ,L.I., Hadjitodorov, S.: Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 1214–1219 (2004)

    Google Scholar 

  15. Fred, A., Jain, A.K.: Learning pairwise similarity for data clustering. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006) (2006)

    Google Scholar 

  16. Azimi, J.: (1386), To study the distribution of the compound clustering, thesis, University of Science and Technology, June

    Google Scholar 

  17. Alizadeh H., (1387), clustering-based combination subset of the initial results, Master Thesis, Department of Computer Engineering, University of Science and Technology, March

    Google Scholar 

  18. Baumgartner, R., Somorjai, R., Summers, R., Richter, W., Ryner, L., Jarmasz, M.: Resampling as a cluster validation technique in fMRI. J. Magn. Reson. Imaging 11, 228–231 (2000)

    Article  Google Scholar 

  19. Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective data clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 424–430, Washington D.C. (2004)

    Google Scholar 

  20. Shamiry, O., Tishby, N.: Cluster stability for finite samples. In: 21st Annual Conference on Neural Information Processing Systems (NIPS 2007) (2007)

    Google Scholar 

  21. Lange, T., Braun, M.L., Roth V., Buhmann, J.M.: Stability-based model selection. In: Advances in Neural Information Processing Systems 15. MIT Press, Cambridge (2003)

    Google Scholar 

  22. Breckenridge, J.: Replicating cluster analysis: Method, consistency and validity. Multivar. Behav. Res. (1989)

    Google Scholar 

  23. Fridlyand, J., Dudoit, S.: Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Statistics Berkeley Tech Report no. 600 (2001)

    Google Scholar 

  24. Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13, 2573–2593 (2001)

    Article  MATH  Google Scholar 

  25. Roth, V., Lange T., Braun, M., Buhmann, J.: A resampling approach to cluster validation. In: International Conference on Computational Statistics, COMPSTAT (2002)

    Google Scholar 

  26. Roth, V., Braun, M.L., Lange, T., Buhmann, J.M.: Stability-based model order selection in clustering with applications to gene expression data. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 607–612. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  27. Rakhlin, A., Caponnetto, A.: Stability of k-means clustering. In: Advances in Neural Information Processing Systems 19. MIT Press, Cambridge (2007)

    Google Scholar 

  28. Luxburg, U.V., Ben-David, S.: Towards a statistical theory of clustering. Technical report, PASCAL Workshop on Clustering, London (2005)

    Google Scholar 

  29. Roth, V., Lange, T.: Feature selection in clustering problems. In: Advances in Neural Information Processing Systems, NIPS 2004 (2004)

    Google Scholar 

  30. Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)

    Article  MATH  Google Scholar 

  31. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pasific Symposium on Biocomputing, vol. 7, pp. 6–17 (2002)

    Google Scholar 

  32. Estivill-Castro, V., Yang, J.: Cluster Validity Using Support Vector Machines. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 244–256. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  33. Moller, U., Radke, D.: A cluster validity approach based on nearest-neighbor resampling. In: Proceedings of the 18th International. Conference on Pattern Recognition (ICPR 2006) (2006)

    Google Scholar 

  34. Brandsma, T., Buishand, T.A.: Simulation of extreme precipitation in the Rhine basin by nearest-neighbour resampling. Hydrol. Earth Syst. Sci. 2, 195–209 (1998)

    Article  Google Scholar 

  35. Inokuchi, R., Nakamura, T., Miyamoto, S.: Kernelized cluster validity measures and application to evaluation of different clustering algorithms. In: Proceedings of the IEEE International Conference on Fuzzy Systems, Canada, 16–21 July 2006

    Google Scholar 

  36. Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 841–846 (1991)

    Article  Google Scholar 

  37. Das, A.K., Sil, J.: Cluster validation using splitting and merging technique. In: Proceedings of International Conference on Computational Intelligence and Multimedia Applications, ICCIMA (2007)

    Google Scholar 

  38. Fern, X., Lin, W.: Cluster ensemble selection. In: SIAM International Conference on Data Mining (SDM 2008) (2008)

    Google Scholar 

  39. Brossier, G.: Piecewise hierarchical clustering. J. Classif. 7(2), 197–216 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  40. Lapointe, F.J., Legendre, P.: The generation of random ultrametric matrices representing dendrograms. J. Classif. 8(2), 177–200 (1991)

    Article  Google Scholar 

  41. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  42. Banfield, C.F.: Ultrametric distances for a single linkage dendrogram. JSTOR: Appl. Stat. Stat. Algorithms 25(3), 313–315 (1976)

    Google Scholar 

  43. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  44. Kaufman, L., Rosseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Book  Google Scholar 

  45. Man, Y., Gath, I.: Detection and separation of ring-shaped clusters using fuzzy clusters. IEEE Trans. Pattern Anal. Mach. Intel. 16(8), 855–861 (1994)

    Article  Google Scholar 

  46. Minaei-Bidgoli, B., Topchy, A., Punch, W.F.: Ensembles of partitions via data resampling. In: Proceedings of International Conference on Information Technology, ITCC 04, Las Vegas

    Google Scholar 

  47. Alizadeh, H., Amirgholipour, S.K., Seyedaghaee, N.R., Minaei-Bidgoli, B.: Nearest cluster ensemble (NCE): clustering ensemble based approach for improving the performance of k-nearest neighbor algorithm. In: 11th Conference of the International Federation of Classification Societies, IFCS09, 13–18 Mar 2009

    Google Scholar 

  48. Mohammadi, M., Alizadeh, H., Minaei-Bidgoli, B.: Neural network ensembles using clustering ensemble and genetic algorithm. In: International Conference on Convergence and Hybrid Information Technology, ICCIT08, IEEE CS, 11–13 Nov 2008

    Google Scholar 

  49. Barthelemy, J.P., Leclerc, B.: The median procedure for partition. In: Cox, I.J. et al. (eds.) Partitioning Data Sets. AMS DIMACS Series in Discrete Mathematics, vol. 19, pp. 3–34 (1995)

    Google Scholar 

  50. Fern, X. Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning, ICML (2003)

    Google Scholar 

  51. Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)

    Article  Google Scholar 

  52. Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1411–1415 (2003)

    Google Scholar 

  53. Fred, A., Jain, A.K.: Robust data clustering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, vol. II, pp. 128–136 USA (2003)

    Google Scholar 

  54. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles. Mach. Learn. (2003)

    Google Scholar 

  55. Newman, C.B.D.J., Hettich, S., Merz, C.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/Ëœmlearn/MLSummary.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Parvin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ahmadi, S., Parvin, H., Rad, F. (2015). Primary Clusters Selection Using Adaptive Algorithms. In: Sidorov, G., Galicia-Haro, S. (eds) Advances in Artificial Intelligence and Soft Computing. MICAI 2015. Lecture Notes in Computer Science(), vol 9413. Springer, Cham. https://doi.org/10.1007/978-3-319-27060-9_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27060-9_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27059-3

  • Online ISBN: 978-3-319-27060-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics