Feature Selection

Bolón-Canedo, Verónica; Alonso-Betanzos, Amparo

doi:10.1007/978-3-319-90080-3_2

Verónica Bolón-Canedo⁵ &
Amparo Alonso-Betanzos⁵

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 147))

964 Accesses
2 Citations

Abstract

The advent of Big Data, and specially the advent of datasets with high dimensionality, has brought an important necessity to identify the relevant features of the data. In this scenario, the importance of feature selection is beyond doubt and different methods have been developed, although researchers do not agree on which one is the best method for any given setting. This chapter provides the reader with the foundations about feature selection (see Sect. 2.1) as well as a description of the state-of-the-art feature selection methods (Sect. 2.2). Then, these methods will be analyzed on several synthetic datasets (Sect. 2.3) trying to draw conclusions about their performance when dealing with a crescent number of irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Finally, in Sect. 2.4, some state-of-the-art methods will be analyzed to study their scalability, i.e. the impact of an increase in the training set on the computational performance of an algorithm in terms of accuracy, training time and stability.

Part of the content of this chapter was previously published in Knowledge and Information Systems (https://doi.org/10.1007/s10115-012-0487-8 and https://doi.org/10.1007/s10115-017-1140-3).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Article Google Scholar
Yang, Y. Pederson, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 20th International Conference on Machine Learning, pp. 856–863 (2003)
Google Scholar
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
MathSciNet MATH Google Scholar
Provost, F.: Distributed data mining: scaling up and beyond. J. Adv. Distrib. Parallel Knowl. Discov. 3–27 (2000)
Google Scholar
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.: Feature Extraction: Foundations and Applications. Springer, Berlin (2006)
Book MATH Google Scholar
Stańczyk, U., Jain, L.C.: Feature Selection for Data and Pattern Recognition. Springer (2015)
Google Scholar
Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press (2007)
Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature Selection for High-dimensional Data. Springer (2015)
Book Google Scholar
Hall, M.A.: Correlation-based Feature Selection for Machine Learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1999)
Google Scholar
Dash, M., Liu, H.: Consistency-based search in feature selection. J. Artif. Intell. 151(1–2), 155–176 (2003)
Article MathSciNet MATH Google Scholar
Zhao, Z., Liu, H.: Searching for interacting features. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1156–1167 (1991)
Google Scholar
Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. J. Comput. Sci. 98, 4–6 (1998)
Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of the European Conference on Machine Learning, pp. 171–182 (1994)
Chapter Google Scholar
Kira, K., Rendell, L.: A practical approach to feature selection. In: Proceedings of the 9th International Workshop on Machine Learning, pp. 249–256 (1992)
Chapter Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S.M.D., Vapnik, V.: Gene selection for cancer classification using support vector machines. J. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Rakotomamonjy, A.: Variable selection using SVM-based criteria. J. Mach. Learn. Res. 3, 1357–1370 (2003)
MathSciNet MATH Google Scholar
Mejía-Lavalle, M., Sucar, E., Arroyo, G.: Feature selection with a perceptron neural net. In: Proceedings of the International Workshop on Feature Selection for Data Mining, pp. 131–135 (2006)
Google Scholar
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. http://www.cs.waikato.ac.nz/ml/weka/ (2005). Accessed July 2017]
Belanche, L.A., González, F.F.: Review and evaluation of feature selection algorithms in synthetic problems. http://arxiv.org/abs/1101.2320. Accessed July 2017
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the 11th International Conference on Machine Learning, pp. 121–129 (1994)
Chapter Google Scholar
Zhu, Z., Ong, Y.S., Zurada, J.M.: Identification of full and partial class relevant genes. IEEE Trans. Comput. Biol. Bioinform. 7(2), 263–277 (2010)
Article Google Scholar
Thrun, S. et al., The MONK’s problems: A performance comparison of different learning algorithms. Technical report CS-91-197, CMU (1991)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group (1984)
Google Scholar
Mamitsuka, H.: Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets. Knowl. Inf. Syst. 9(1), 91–108 (2006)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Google Scholar
Rish, I.: An empirical study of the naive bayes classifier. In: Proceedings of IJCAI-01 Workshop on Empirical Methods in Artificial Intelligence, pp. 41–46 (2001)
Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. J. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: An Introduction To Support Vector Machines And Other Kernel-based Learning Methods, Cambridge University Press (2000)
Google Scholar
Bolon-Canedo, V., Sanchez-Marono, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. J. Artif. Intell. 97(1–2), 273–324 (1997)
Article MATH Google Scholar
Kim, G., Kim, Y., Lim, H., Kim, H.: An MLP-based feature subset selection for HIV-1 protease cleavage site analysis. J. Artif. Intell. Med. 48, 83–89 (2010)
Article Google Scholar
Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: Testing different ensemble configurations for feature selection. Neural Process. Lett. 46, 857–880 (2017)
Article Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Recent advances and emerging challenges of feature selection in the context of big data. Knowl.-Based Syst 86, 33–45 (2015)
Article Google Scholar
Khoshgoftaar, T M., Golawala, M. and Van Hulse, J. An empirical study of learning from imbalanced data using random forest. In: ICTAI 2007. 19th IEEE International Conference on Tools with Artificial Intelligence, vol. 2, pp. 310–317. IEEE (2007)
Google Scholar
Liu, H. and Setiono, R.Chi2: Feature selection and discretization of numeric attributes. In: Proceedings of Seventh International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE (1995)
Google Scholar
Bolón-Canedo, V., Rego-Fernández, D., Peteiro-Barral, D., Alonso-Betanzos, A., Guijarro-Berdiñas, B., Sánchez-Maroño, N.: On the scalability of feature selection methods on high-dimensional data. Knowl. Inf. Syst. (2017, in press)
Google Scholar

Download references

Author information

Authors and Affiliations

Facultad de Informática, Universidade da Coruña, A Coruña, Spain
Verónica Bolón-Canedo & Amparo Alonso-Betanzos

Authors

Verónica Bolón-Canedo
View author publications
You can also search for this author in PubMed Google Scholar
Amparo Alonso-Betanzos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Verónica Bolón-Canedo .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bolón-Canedo, V., Alonso-Betanzos, A. (2018). Feature Selection. In: Recent Advances in Ensembles for Feature Selection. Intelligent Systems Reference Library, vol 147. Springer, Cham. https://doi.org/10.1007/978-3-319-90080-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-90080-3_2
Published: 01 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90079-7
Online ISBN: 978-3-319-90080-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics