On Measuring the Complexity of Classification Problems

Lorena, Ana Carolina; de Souto, Marcilio C. P.

doi:10.1007/978-3-319-26532-2_18

Ana Carolina Lorena¹⁷ &
Marcilio C. P. de Souto¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9489))

Included in the following conference series:

International Conference on Neural Information Processing

2231 Accesses
6 Citations

Abstract

There has been a growing interest in describing the difficulty of solving a classification problem. This knowledge can be used, among other things, to support more grounded decisions concerning data pre-processing, as well as for the development of new data-driven pattern recognition techniques. Indeed, to estimate the intrinsic complexity of a classification problem, there are a variety of measures that can be extracted from a training data set. This paper presents some of them, performing a theoretical analysis.

A.C. Lorena—Acknowledgements to the Brazilian Research Agencies FAPESP and CNPq.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antolnez, N.M.: Data complexity in supervised learning: a far-reaching implication. Ph.D. thesis, La Salle, Universitat Ramon Llull (2011)
Google Scholar
Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer, London (2006)
Book MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Book MATH Google Scholar
Cummins, L.: Combining and choosing case base maintenance algorithms. Ph.D. thesis, National University of Ireland, Cork (2013)
Google Scholar
Dong, M., Kothari, R.: Feature subset selection using a new definition of classificability. PRL 24, 1215–1225 (2003)
Article MATH Google Scholar
Flores, M.J., Gámez, J.A., Martínez, A.M.: Domains of competence of the semi-naive bayesian network classifiers. Inf. Sci. 260, 120–148 (2014)
Article MathSciNet MATH Google Scholar
Garcia, L.P.F., de Carvalho, A.C.P.L.F., Lorena, A.C.: Effect of label noise in the complexity of classification problems. Neurocomputing (accepted) (2015, in press)
Google Scholar
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Hoekstra, A., Duin, R.P.: On the nonlinearity of pattern classifiers. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 4, pp. 271–275. IEEE (1996)
Google Scholar
Hu, Q., Pedrycz, W., Yu, D., Lang, J.: Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(1), 137–150 (2010)
Article Google Scholar
Li, L., Abu-Mostafa, Y.S.: Data complexity in machine learning. Technical Report CaltechCSTR:2006.004, Caltech Computer Science (2006)
Google Scholar
Lorena, A.C., Costa, I.G., Spolar, N., Souto, M.C.P.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75, 33–42 (2012)
Article Google Scholar
Luengo, J., Herrera, F.: Shared domains of competence of approximate learning models using measures of separability of classes. Inf. Sci. 185(1), 43–65 (2012)
Article MathSciNet Google Scholar
Mansilla, E.B., Ho, T.K.: On classifier domains of competence. In: Proceedings of the 17th ICPR, pp. 136–139 (2004)
Google Scholar
Mollineda, R.A., Sánchez, J.S., Sotoca, J.M.: Data characterization for effective prototype selection. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523, pp. 27–34. Springer, Heidelberg (2005)
Chapter Google Scholar
Orriols-Puig, A., Maci, N., Ho, T.K.: Documentation for the data complexity library in c++. Technical report, La Salle - Universitat Ramon Llull (2010)
Google Scholar
Singh, S.: Multiresolution estimates of classification complexity. IEEE Trans. PAMI 25, 1534–1539 (2003)
Article Google Scholar
Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)
Article MathSciNet Google Scholar
Souto, M.C.P., Lorena, A.C., Spolar, N., Costa, I.G.: Complexity measures of supervised classification tasks: a case study for cancer gene expression data. In: Proceedings of IJCNN, pp. 1352–1358 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Ciência e Tecnologia, Universidade Federal de São Paulo, Parque Tecnológico, São José dos Campos, SP, Brazil
Ana Carolina Lorena
Univ. Orléans, INSA Centre Val de Loire, LIFO EA 4022, Orléans, France
Marcilio C. P. de Souto

Authors

Ana Carolina Lorena
View author publications
You can also search for this author in PubMed Google Scholar
Marcilio C. P. de Souto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana Carolina Lorena .

Editor information

Editors and Affiliations

University of Istanbul, Istanbul, Turkey
Sabri Arik
University at Qatar, Doha, Qatar
Tingwen Huang
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Weng Kin Lai
University of Science Technology, Wuhan, China
Qingshan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lorena, A.C., de Souto, M.C.P. (2015). On Measuring the Complexity of Classification Problems. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-26532-2_18
Published: 12 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26531-5
Online ISBN: 978-3-319-26532-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics