Skip to main content

An Improvement of Stability Based Method to Clustering

  • Conference paper
Advanced Computational Methods for Knowledge Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 358))

  • 1304 Accesses

Abstract

In recent years, the concept of clustering stability is widely used to determining the number of clusters in a given dataset. This paper proposes an improvement of stability methods based on bootstrap technique. This amelioration is achieved by combining the instability property with an evaluation criterion and using a DCA (Difference Convex Algorithm) based clustering algorithm. DCA is an innovative approach in nonconvex programming, which has been successfully applied to many (smooth or nonsmooth) large-scale nonconvex programs in various domains. Experimental results on both synthetic and real datasets are promising and demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ben-Hur, A., Elisseeff, A., Guyon, I.: A Stability Based Method for Discovering Structure in Clustered Data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2002)

    Google Scholar 

  2. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics Simulation and Computation 3(1), 1–27 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chiang, M.M., Mirkin, B.: Experiments for the Number of Clusters in K-Means. In: EPIA Workshops, pp. 395–405 (2007)

    Google Scholar 

  4. Chiang, M.M., Mirkin, B.: Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads. Journal Classification 27(1), 3–40 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  5. Fang, Y., Wang, J.: Selection of the Number of Clusters via the Bootstrap Method. Computation Statistics and Data Analysis 56(3), 468–477 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Hamerly, G., Elkan, C.: Learning the K in K-Means. In: Neural Information Processing Systems. MIT Press (2003)

    Google Scholar 

  7. Jinyan, L., Huiqing, L.: Kent ridge bio-medical dataset repository (2002)m, http://datam.i2r.a-star.edu.sg/datasets/krbd/index.html (accessed on october 2014)

  8. Kudova, P.: Clustering Genetic Algorithm. In: 18th International Workshop on DEXA, Regensburg, Germany (2007)

    Google Scholar 

  9. Minh, L.H., Thuy, T.M.: DC programming and DCA for solving Minimum Sum–of–Squares Clustering using weighted dissimilarity measures. Special Issue on Optimization and Machine Learning. Transaction on Computational Collective Intelligent XIII (2014)

    Google Scholar 

  10. Le Thi, H.A.: Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications. HDR, Univesité. Rouen (1997)

    Google Scholar 

  11. Le Thi, H.A.: DC Programming and DCA, http://lita.sciences.univ-metz.fr/~lethi

  12. Le Thi, H.A., Le Hoai, M., Van Nguyen, V.: A DC Programming approach for Feature Selection in Support Vector Machines learning. Journal of Advances in Data Analysis and Classification 2(3), 259–278 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  13. Le Thi, H.A., Le Hoai, M., Pham Dinh, T.: Fuzzy clustering based on nonconvex optimisation approaches using difference of convex (DC) functions algorithms. Journal of Advances in Data Analysis and Classification 2, 1–20 (2007)

    MATH  Google Scholar 

  14. Le Thi, H.A., Le Hoai, M.: Optimization based DC programming and DCA for Hierarchical Clustering. European Journal of Operational Research 183, 1067–1085 (2006)

    MathSciNet  MATH  Google Scholar 

  15. Le Thi, H.A., Le Hoai, M., Pham Dinh, T., Van Huynh, N.: Binary classification via spherical separator by DC programming and DCA. Journal of Global Optimization, 1–15 (2012)

    Google Scholar 

  16. Le Thi, H.A., Le Hoai, M., Pham Dinh, T., Van Huynh, N.: Block Clustering based on DC programming and DCA. Neural Computation 25(10) (2013)

    Google Scholar 

  17. Le Thi, H.A., Tayeb Belghiti, M., Pham Dinh, T.: A new efficient algorithm based on DC programming and DCA for clustering. Journal of Global Optimization 37(4), 593–608 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  18. Le Thi, H.A., Pham Dinh, T.: DC programming: Theory, algorithms and applications. In: The State of the Proceedings of The First International Workshop on Global Constrained Optimization and Constraint Satisfaction (Cocos 2002), Valbonne-Sophia Antipolis, France (October 2002)

    Google Scholar 

  19. Le Thi, H.A., Pham Dinh, T.: The DC (Difference of Convex functions) Programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research 46, 23–46 (2005)

    MathSciNet  MATH  Google Scholar 

  20. Le Thi, H.A., Vo Xuan, T., Pham Dinh, T.: Feature Selection for linear SVMs under Uncertain Data: Robust optimization based on Difference of Convex functions Algorithms. Neural Networks 59, 36–50 (2014)

    Article  MATH  Google Scholar 

  21. Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), http://archive.ics.uci.edu/ml (accessed on October 2014)

  22. Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Susan, J.B.: Incremental genetic K-means algorithm and its application in gene expression data analysis. BMC Bioinformatics (2004)

    Google Scholar 

  23. Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition 33(9), 1455–1465 (2000)

    Article  Google Scholar 

  24. Melnykov, V., Chen, W.C., Maitra, R.: MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms. Journal of Statistical Software 51(12), 1–25 (2012)

    Article  Google Scholar 

  25. Milligan, G., Cooper, M.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 50(2), 159–179 (1985)

    Article  Google Scholar 

  26. Pelleg, D., Moore, A.: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Pro. of the 17th International Conference on Machine Learning, pp. 727–734 (2000)

    Google Scholar 

  27. Pham Dinh, T., Le Thi, H.: Recent Advances in DC Programming and DCA. Transaction on Computational Collective Intelligence 8342, 1–37 (2014)

    Google Scholar 

  28. Pham Dinh, T., Le Thi, H.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Mathematica Vietnamica 1, 289–355 (1997)

    MATH  Google Scholar 

  29. Sharma, S., Rai, S.: Genetic K-Means Algorithm Implementation and Analysis. International Journal of Recent Technology and Engineering 1(2), 117–120 (2012)

    Google Scholar 

  30. Sugar, C.A., Gareth, J.M.: Finding the number of clusters in a dataset: An information theoretic approach. Journal of the American Statistical Association 33, 750–763 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  31. Ta Minh Thuy: Techniques d’optimisation non convexe basée sur la programmation DC et DCA et méthodes evolutives pour la classification non supervisée. Ph.D thesis, University of Lorraine (2014), http://docnum.univ-lorraine.fr/public/DDOC_T_2014_0099_TA.pdf (accessed on January 2015)

  32. Thuy, T.M., Le Thi, H.A., Boudjeloud-Assala, L.: An Efficient Clustering Method for Massive Dataset Based on DC Programming and DCA Approach. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013, Part II. LNCS, vol. 8227, pp. 538–545. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  33. Ta, M.T., Le Thi, H.A., Boudjeloud-Assala, L.: Clustering Data Stream by a Sub-window Approach Using DCA. In: Perner, P. (ed.) MLDM 2012. LNCS, vol. 7376, pp. 279–292. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  34. Thuy, T.M., Le An, T.H., Boudjeloud-Assala, L.: Clustering data streams over sliding windows by DCA. In: Nguyen, N.T., van Do, T., Thi, H.A. (eds.) ICCSAMA 2013. SCI, vol. 479, pp. 65–75. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  35. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the Gap statistic. Journal of Royal Statistical Society, Series B 63, 411–423 (2000)

    Article  MATH  Google Scholar 

  36. Ulrike von, L.: Clustering Stability: An Overview. Foundations and Trends in Machine Learning 2(3), 235–274 (2009)

    Article  MATH  Google Scholar 

  37. Wang, J.: Consistent selection of the number of clusters via cross validation. Biometrika 97(4), 893–904 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  38. http://webdocs.cs.ualberta.ca/~yaling/Cluster/Php/data_gen.php (accessed on (October 2014)

  39. http://www.nipsfsc.ecs.soton.ac.uk/datasets/ (accessed on October 2014)

  40. http://cs.joensuu.fi/sipu/datasets/ (accessed on October 2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ta Minh Thuy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Thuy, T.M., Thi Hoai An, L. (2015). An Improvement of Stability Based Method to Clustering. In: Le Thi, H., Nguyen, N., Do, T. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-319-17996-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17996-4_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17995-7

  • Online ISBN: 978-3-319-17996-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics