Skip to main content

Evaluating Data Characterization Measures for Clustering Problems in Meta-learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Abstract

An accurate data characterization is essential for a reliable selection of clustering algorithms via meta-learning. This work evaluates a set of measures for characterizing clustering problems using beta regression and two well-known machine learning regression techniques as meta-models. We have observed a subset of meta-features which demonstrates greater resourcefulness to characterize the clustering datasets. In addition, secondary findings made it possible to verify the direction and magnitude of the influence and the importance of such measures in predicting the performance of the algorithms under analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101(11), 3747–3752 (2004)

    Article  Google Scholar 

  2. Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)

    Article  Google Scholar 

  3. Brazdil, P., Carrier, C.G., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-73263-1

    Book  MATH  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  5. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14

    Chapter  Google Scholar 

  6. De Souto, M.C., et al.: Ranking and selecting clustering algorithms using a meta-learning approach. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 3729–3735 (2008)

    Google Scholar 

  7. Espinheira, P.L., da Silva, L.C.M., Silva, A.D.O., Ospina, R.: Model selection criteria on beta regression for machine learning. Mach. Learn. Knowl. Extract. 1(1), 427–449 (2019)

    Article  Google Scholar 

  8. Fernandes, L.H.D.S., Lorena, A.C., Smith-Miles, K.: Towards understanding clustering problems and algorithms: an instance space analysis. Algorithms 14(3), 95 (2021)

    Article  MathSciNet  Google Scholar 

  9. Ferrari, D.G., de Castro, L.N.: Clustering algorithm recommendation: a meta-learning approach. In: Panigrahi, B.K., Das, S., Suganthan, P.N., Nanda, P.K. (eds.) SEMCCO 2012. LNCS, vol. 7677, pp. 143–150. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35380-2_18

    Chapter  Google Scholar 

  10. Ferrari, D.G., De Castro, L.N.: Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf. Sci. 301, 181–194 (2015)

    Article  Google Scholar 

  11. Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31(7), 799–815 (2004)

    Article  MathSciNet  Google Scholar 

  12. Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018). https://doi.org/10.1007/s10489-018-1238-7

    Article  MATH  Google Scholar 

  13. Handl, J., Knowles, J.: Cluster generators for large high-dimensional data sets with large numbers of clusters (2005). https://personalpages.manchester.ac.uk/staff/Julia.Handl/generators.html. Accessed 5 Aug 2021

  14. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    Article  Google Scholar 

  15. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  16. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075

    Article  MATH  Google Scholar 

  17. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MathSciNet  Google Scholar 

  18. Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)

    Article  MathSciNet  Google Scholar 

  19. Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)

    Article  Google Scholar 

  20. Ma, J.: Estimating transfer entropy via copula entropy. arXiv preprint. arXiv:1910.04375 (2019)

  21. Mardia, K.V.: Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3), 519–530 (1970)

    Article  MathSciNet  Google Scholar 

  22. Pimentel, B.A., de Carvalho, A.C.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)

    Article  Google Scholar 

  23. Pimentel, B.A., de Carvalho, A.C.: A meta-learning approach for recommending the number of clusters for clustering algorithms. Knowl.-Based Syst. 195, 105682 (2020)

    Article  Google Scholar 

  24. Rice, J.R.: The algorithm selection problem. In: Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)

    Google Scholar 

  25. Sáez, J.A., Corchado, E.: A meta-learning recommendation system for characterizing unsupervised problems: on using quality indices to describe data conformations. IEEE Access 7, 63247–63263 (2019)

    Article  Google Scholar 

  26. Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. (CSUR) 41(1), 6 (2009)

    Article  Google Scholar 

  27. Soares, R.G.F., Ludermir, T.B., De Carvalho, F.A.T.: An analysis of meta-learning techniques for ranking clustering algorithms applied to artificial data. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 131–140. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04274-4_14

    Chapter  Google Scholar 

  28. Vanschoren, J.: Meta-learning: a survey. arXiv preprint arXiv:1810.03548 (2018)

Download references

Acknowledgements

To the Brazilian research agency CNPq.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luiz Henrique dos S. Fernandes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fernandes, L.H.d.S., de Souto, M.C.P., Lorena, A.C. (2021). Evaluating Data Characterization Measures for Clustering Problems in Meta-learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92185-9_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92184-2

  • Online ISBN: 978-3-030-92185-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics