Does Feature Selection Improve Classification? A Large Scale Experiment in OpenML

Post, Martijn J.; van der Putten, Peter; van Rijn, Jan N.

doi:10.1007/978-3-319-46349-0_14

Martijn J. Post¹⁷,
Peter van der Putten¹⁷ &
Jan N. van Rijn¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9897))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1927 Accesses
12 Citations

Abstract

It is often claimed that data pre-processing is an important factor contributing towards the performance of classification algorithms. In this paper we investigate feature selection, a common data pre-processing technique. We conduct a large scale experiment and present results on what algorithms and data sets benefit from this technique. Using meta-learning we can find out for which combinations this is the case. To complement a large set of meta-features, we introduce the Feature Selection Landmarkers, which prove useful for this task. All our experimental results are made publicly available on OpenML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Full details: http://www.openml.org/s/15.

References

Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)
Article MathSciNet MATH Google Scholar
Brazdil, P., Gama, J., Henery, B.: Characterizing the applicability of classification algorithms using meta-level learning. In: Bergadano, F., Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 83–102. Springer, Heidelberg (1994). doi:10.1007/3-540-57868-4_52
Chapter Google Scholar
Carpenter, J.: May the best analyst win. Science 331(6018), 698–699 (2011)
Article Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Article Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1998)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Article MATH Google Scholar
Peng, Y., Flach, P.A., Soares, C., Brazdil, P.: Improved dataset characterisation for meta-learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002). doi:10.1007/3-540-36182-0_14
Chapter Google Scholar
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Tell me who can learn you and I can tell you who you are: landmarking various learning algorithms. In: Proceedings of the 17th International Conference on Machine Learning, pp. 743–750 (2000)
Google Scholar
Pinto, F., Soares, C., Mendes-Moreira, J.: Towards automatic generation of metafeatures. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 215–226. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31753-3_18
Chapter Google Scholar
van der Putten, P., van Someren, M.: A bias-variance analysis of a real world learning problem: the coil challenge 2000. Mach. Learn. 57(1), 177–195 (2004)
Article MATH Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: popular nearest neighbors in high-dimensional data. JMLR 11, 2487–2531 (2010)
MathSciNet MATH Google Scholar
Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65–118 (1976)
Article Google Scholar
van Rijn, J.N., Abdulrahman, S.M., Brazdil, P., Vanschoren, J.: Fast algorithm selection using learning curves. In: Fromont, E., Bie, T., Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 298–309. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24465-5_26
Chapter Google Scholar
Tsamardinos, I., Aliferis, C.: Towards principled feature selection: relevancy, filters and wrappers. In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (2003)
Google Scholar
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)
Article Google Scholar
Verikas, A., Bacauskiene, M.: Feature selection with neural networks. Pattern Recogn. Lett. 23(11), 1323–1335 (2002)
Article MATH Google Scholar
Vilalta, R., Giraud-Carrier, C.G., Brazdil, P., Soares, C.: Using meta-learning to support data mining. IJCSA 1(1), 31–45 (2004)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Leiden University, Leiden, The Netherlands
Martijn J. Post, Peter van der Putten & Jan N. van Rijn

Authors

Martijn J. Post
View author publications
You can also search for this author in PubMed Google Scholar
Peter van der Putten
View author publications
You can also search for this author in PubMed Google Scholar
Jan N. van Rijn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martijn J. Post .

Editor information

Editors and Affiliations

Stockholm University , Stockholm, Sweden
Henrik Boström
Leiden University , Leiden, The Netherlands
Arno Knobbe
University of Porto , Porto, Portugal
Carlos Soares
Stockholm University , Stockholm, Sweden
Panagiotis Papapetrou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Post, M.J., van der Putten, P., van Rijn, J.N. (2016). Does Feature Selection Improve Classification? A Large Scale Experiment in OpenML. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-46349-0_14
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46348-3
Online ISBN: 978-3-319-46349-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics