Abstract
There has been a growing interest in describing the difficulty of solving a classification problem. This knowledge can be used, among other things, to support more grounded decisions concerning data pre-processing, as well as for the development of new data-driven pattern recognition techniques. Indeed, to estimate the intrinsic complexity of a classification problem, there are a variety of measures that can be extracted from a training data set. This paper presents some of them, performing a theoretical analysis.
A.C. Lorena—Acknowledgements to the Brazilian Research Agencies FAPESP and CNPq.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Antolnez, N.M.: Data complexity in supervised learning: a far-reaching implication. Ph.D. thesis, La Salle, Universitat Ramon Llull (2011)
Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer, London (2006)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Cummins, L.: Combining and choosing case base maintenance algorithms. Ph.D. thesis, National University of Ireland, Cork (2013)
Dong, M., Kothari, R.: Feature subset selection using a new definition of classificability. PRL 24, 1215–1225 (2003)
Flores, M.J., Gámez, J.A., Martínez, A.M.: Domains of competence of the semi-naive bayesian network classifiers. Inf. Sci. 260, 120–148 (2014)
Garcia, L.P.F., de Carvalho, A.C.P.L.F., Lorena, A.C.: Effect of label noise in the complexity of classification problems. Neurocomputing (accepted) (2015, in press)
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Hoekstra, A., Duin, R.P.: On the nonlinearity of pattern classifiers. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 4, pp. 271–275. IEEE (1996)
Hu, Q., Pedrycz, W., Yu, D., Lang, J.: Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(1), 137–150 (2010)
Li, L., Abu-Mostafa, Y.S.: Data complexity in machine learning. Technical Report CaltechCSTR:2006.004, Caltech Computer Science (2006)
Lorena, A.C., Costa, I.G., Spolar, N., Souto, M.C.P.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75, 33–42 (2012)
Luengo, J., Herrera, F.: Shared domains of competence of approximate learning models using measures of separability of classes. Inf. Sci. 185(1), 43–65 (2012)
Mansilla, E.B., Ho, T.K.: On classifier domains of competence. In: Proceedings of the 17th ICPR, pp. 136–139 (2004)
Mollineda, R.A., Sánchez, J.S., Sotoca, J.M.: Data characterization for effective prototype selection. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523, pp. 27–34. Springer, Heidelberg (2005)
Orriols-Puig, A., Maci, N., Ho, T.K.: Documentation for the data complexity library in c++. Technical report, La Salle - Universitat Ramon Llull (2010)
Singh, S.: Multiresolution estimates of classification complexity. IEEE Trans. PAMI 25, 1534–1539 (2003)
Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)
Souto, M.C.P., Lorena, A.C., Spolar, N., Costa, I.G.: Complexity measures of supervised classification tasks: a case study for cancer gene expression data. In: Proceedings of IJCNN, pp. 1352–1358 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lorena, A.C., de Souto, M.C.P. (2015). On Measuring the Complexity of Classification Problems. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-26532-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26531-5
Online ISBN: 978-3-319-26532-2
eBook Packages: Computer ScienceComputer Science (R0)