Evaluating Data Characterization Measures for Clustering Problems in Meta-learning

Fernandes, Luiz Henrique dos S.; de Souto, Marcilio C. P.; Lorena, Ana C.

doi:10.1007/978-3-030-92185-9_51

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13108))

Included in the following conference series:

International Conference on Neural Information Processing

2652 Accesses
1 Citations

Abstract

An accurate data characterization is essential for a reliable selection of clustering algorithms via meta-learning. This work evaluates a set of measures for characterizing clustering problems using beta regression and two well-known machine learning regression techniques as meta-models. We have observed a subset of meta-features which demonstrates greater resourcefulness to characterize the clustering datasets. In addition, secondary findings made it possible to verify the direction and magnitude of the influence and the importance of such measures in predicting the performance of the algorithms under analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101(11), 3747–3752 (2004)
Article Google Scholar
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Article Google Scholar
Brazdil, P., Carrier, C.G., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-73263-1
Book MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14
Chapter Google Scholar
De Souto, M.C., et al.: Ranking and selecting clustering algorithms using a meta-learning approach. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 3729–3735 (2008)
Google Scholar
Espinheira, P.L., da Silva, L.C.M., Silva, A.D.O., Ospina, R.: Model selection criteria on beta regression for machine learning. Mach. Learn. Knowl. Extract. 1(1), 427–449 (2019)
Article Google Scholar
Fernandes, L.H.D.S., Lorena, A.C., Smith-Miles, K.: Towards understanding clustering problems and algorithms: an instance space analysis. Algorithms 14(3), 95 (2021)
Article MathSciNet Google Scholar
Ferrari, D.G., de Castro, L.N.: Clustering algorithm recommendation: a meta-learning approach. In: Panigrahi, B.K., Das, S., Suganthan, P.N., Nanda, P.K. (eds.) SEMCCO 2012. LNCS, vol. 7677, pp. 143–150. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35380-2_18
Chapter Google Scholar
Ferrari, D.G., De Castro, L.N.: Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf. Sci. 301, 181–194 (2015)
Article Google Scholar
Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31(7), 799–815 (2004)
Article MathSciNet Google Scholar
Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018). https://doi.org/10.1007/s10489-018-1238-7
Article MATH Google Scholar
Handl, J., Knowles, J.: Cluster generators for large high-dimensional data sets with large numbers of clusters (2005). https://personalpages.manchester.ac.uk/staff/Julia.Handl/generators.html. Accessed 5 Aug 2021
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Article Google Scholar
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075
Article MATH Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Article MathSciNet Google Scholar
Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)
Article MathSciNet Google Scholar
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)
Article Google Scholar
Ma, J.: Estimating transfer entropy via copula entropy. arXiv preprint. arXiv:1910.04375 (2019)
Mardia, K.V.: Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3), 519–530 (1970)
Article MathSciNet Google Scholar
Pimentel, B.A., de Carvalho, A.C.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)
Article Google Scholar
Pimentel, B.A., de Carvalho, A.C.: A meta-learning approach for recommending the number of clusters for clustering algorithms. Knowl.-Based Syst. 195, 105682 (2020)
Article Google Scholar
Rice, J.R.: The algorithm selection problem. In: Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)
Google Scholar
Sáez, J.A., Corchado, E.: A meta-learning recommendation system for characterizing unsupervised problems: on using quality indices to describe data conformations. IEEE Access 7, 63247–63263 (2019)
Article Google Scholar
Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. (CSUR) 41(1), 6 (2009)
Article Google Scholar
Soares, R.G.F., Ludermir, T.B., De Carvalho, F.A.T.: An analysis of meta-learning techniques for ranking clustering algorithms applied to artificial data. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 131–140. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04274-4_14
Chapter Google Scholar
Vanschoren, J.: Meta-learning: a survey. arXiv preprint arXiv:1810.03548 (2018)

Download references

Acknowledgements

To the Brazilian research agency CNPq.

Author information

Authors and Affiliations

Instituto Tecnológico de Aeronáutica, São José dos Campos, SP, 12228-900, Brazil
Luiz Henrique dos S. Fernandes & Ana C. Lorena
Université d’Orléans, Orléans, 45100, France
Marcilio C. P. de Souto

Authors

Luiz Henrique dos S. Fernandes
View author publications
You can also search for this author in PubMed Google Scholar
Marcilio C. P. de Souto
View author publications
You can also search for this author in PubMed Google Scholar
Ana C. Lorena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luiz Henrique dos S. Fernandes .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernandes, L.H.d.S., de Souto, M.C.P., Lorena, A.C. (2021). Evaluating Data Characterization Measures for Clustering Problems in Meta-learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_51

Download citation

DOI: https://doi.org/10.1007/978-3-030-92185-9_51
Published: 06 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92184-2
Online ISBN: 978-3-030-92185-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Data Characterization Measures for Clustering Problems in Meta-learning