Abstract
Supervised classification is one of the most used methods in machine learning. In case of data characterized by a large number of features, a critical issue is to deal with redundant or irrelevant information. To this extent, an effective algorithm needs to identify a suitable subset of features, as small as possible, for the classification. In this work we present ReGEC_L1, a classifier with embedded feature selection based on the Regularized Generalized Eigenvalue Classifier (ReGEC) and equipped with a L1-norm regularization term. We detail the mathematical formulation and the numerical algorithm. Numerical results, obtained on some de facto standard benchmark data sets, show that the approach we propose produces a remarkable selection of the features, without losing accuracy in the classification. In that respect, our algorithm seems to compare favorably with the SVM_L1 method. A MATLAB implementation of ReGEC_L1 is available at http://www.na.icar.cnr.it/~mariog/regec_l1.html.
Similar content being viewed by others
References
Guarracino, M.R., Cuciniello, S., Feminiano, D., Toraldo, G., Pardalos, P.M.: Current classification algorithms for biomedical applications. CRM Proc. Lect. Notes 45, 109–127 (2008)
Pardalos, P.M., Xanthopoulos, P., Zervakis, M.: Data Mining for Biomarker Discovery, vol. 65. Springer, Berlin (2012)
Amaldi, E., Kann, V.: On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209(1), 237–260 (1998)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Ferraro, M.B., Irpino, A., Verde, R., Guarracino, M.R.: A novel feature selection method for classification using a fuzzy criterion. In: Nicosia, G., Pardalos, P. (eds.) Learning and Intelligent Optimization, pp. 455–467. Springer, Berlin (2013)
Guarracino, M., Cuciniello, S., Pardalos, P.: Classification and characterization of gene expression data with generalized eigenvalues. J. Optim. Theory Appl. 141(3), 533–545 (2009)
Schölkopf, B., Smola, A.J.: Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, Cambridge (2002)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)
Guarracino, M.R., Cifarelli, C., Seref, O., Pardalos, P.M.: A classification method based on generalized eigenvalue problems. Optim. Methods Softw. 22(1), 73–81 (2007)
Mangasarian, O.L., Wild, E.W.: Multisurface proximal support vector machine classification via generalized eigenvalues. Pattern Anal. Mach. Intell. IEEE Trans. 28(1), 69–74 (2006)
Lancaster, P., Ye, Q.: Variational properties and rayleigh quotient algorithms for symmetric matrix pencils. In: The Gohberg Anniversary Collection, pp. 247–278. Springer, Berlin (1989)
Ye, Q.: Variational Principles and Numerical Algorithms for Symmetric Matrix Pencils. University of Calgary, Mathematics and Statistics, Calgary (1989)
Saad, Y.: Numerical Methods for Large Eigenvalue Problems: Revised Edition. Classics in Applied Mathematics, Vol. 66. SIAM (2011)
Gao, X.B., Golub, G.H., Liao, L.Z.: Continuous methods for symmetric generalized eigenvalue problems. Linear Alg. Appl. 428(2), 676–696 (2008)
Wang, L., Zhu, J., Zou, H.: The doubly regularized support vector machine. Stat. Sin. 16(2), 589 (2006)
Li, C.N., Shao, Y.H., Deng, N.Y.: Robust l1-norm non-parallel proximal support vector machine. Optimization (ahead-of-print) 1–15 (2014)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005). (Statistical Methodology)
Schmidt, M., Fung, G., Rosales, R.: Optimization methods for l1-regularization. University of British Columbia, Technical report TR-2009 19 (2009)
Absil, P.A., Baker, C., Gallivan, K.: A truncated-cg style method for symmetric generalized eigenvalue problems. J. Comput. Appl. Math. 189(1), 274–285 (2006)
Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2007)
Zhang, L.H.: On optimizing the sum of the rayleigh quotient and the generalized rayleigh quotient on the unit sphere. Comput. Optim. Appl. 54(1), 111–139 (2013)
Guarracino, M.R., Irpino, A., Verde, R.: Multiclass generalized eigenvalue proximal support vector machines. In: Complex, Intelligent and Software Intensive Systems (CISIS), 2010 International Conference on, pp. 25–32. IEEE (2010)
Regec\(\_\)L1 download page. http://www.na.icar.cnr.it/~mariog/regec_l1.html
Bache, K., Lichman, M.: Uci machine learning repository, 901. http://www.archive.ics.uci.edu/ml (2013)
Qian, J., Hastie, T., Friedman, J., Tibshirani, R., Simon, N.: Glmnet for matlab, 2013. http://www.stanford.edu/hastie/glmnet_matlab (2013)
Yan, K., Zhang, D.: Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sens. Actuat. B Chem. 212, 353–363 (2015)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
De Asmundis, R., di Serafino, D., Riccio, F., Toraldo, G.: On spectral properties of steepest descent methods. IMA J. Numer. Anal. 33, 1416–1435 (2013)
De Asmundis, R., di Serafino, D., Hager, W.W., Toraldo, G., Zhang, H.: An efficient gradient method using the Yuan steplength. Comput. Opt. Appl. 59(3), 541–563 (2014)
De Angelis, P.L., Toraldo, G.: On the identification property of a projected gradient method. SIAM J. Numer. Anal. 30(5), 1483–1497 (1993)
Acknowledgments
Mara Sangiovanni was supported by Interomics Italian Flagship Project and MIUR PON02-00612. Mario Guarracino and Gerardo Toraldo were partially supported by INdAM-GNCS, under the 2015 Project “Numerical Methods for Nonconvex/Nonsmooth Optimization and Applications”. Mario Guarracino work has been conducted at National Research University Higher School of Economics (HSE) and has been supported by the RSF Grant No. 14-41-00039. Marco Viola work was performed during his undergraduate stage at the Institute for High Performance Computing and Networking (ICAR) of the National Research Council (CNR).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Viola, M., Sangiovanni, M., Toraldo, G. et al. A generalized eigenvalues classifier with embedded feature selection. Optim Lett 11, 299–311 (2017). https://doi.org/10.1007/s11590-015-0955-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-015-0955-7