Skip to main content

Does Feature Selection Improve Classification? A Large Scale Experiment in OpenML

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XV (IDA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9897))

Included in the following conference series:

Abstract

It is often claimed that data pre-processing is an important factor contributing towards the performance of classification algorithms. In this paper we investigate feature selection, a common data pre-processing technique. We conduct a large scale experiment and present results on what algorithms and data sets benefit from this technique. Using meta-learning we can find out for which combinations this is the case. To complement a large set of meta-features, we introduce the Feature Selection Landmarkers, which prove useful for this task. All our experimental results are made publicly available on OpenML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Full details: http://www.openml.org/s/15.

References

  1. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  2. Brazdil, P., Gama, J., Henery, B.: Characterizing the applicability of classification algorithms using meta-level learning. In: Bergadano, F., Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 83–102. Springer, Heidelberg (1994). doi:10.1007/3-540-57868-4_52

    Chapter  Google Scholar 

  3. Carpenter, J.: May the best analyst win. Science 331(6018), 698–699 (2011)

    Article  Google Scholar 

  4. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  5. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)

    Article  Google Scholar 

  6. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  8. Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1998)

    Google Scholar 

  9. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)

    Google Scholar 

  10. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)

    Article  MATH  Google Scholar 

  11. Peng, Y., Flach, P.A., Soares, C., Brazdil, P.: Improved dataset characterisation for meta-learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002). doi:10.1007/3-540-36182-0_14

    Chapter  Google Scholar 

  12. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Tell me who can learn you and I can tell you who you are: landmarking various learning algorithms. In: Proceedings of the 17th International Conference on Machine Learning, pp. 743–750 (2000)

    Google Scholar 

  13. Pinto, F., Soares, C., Mendes-Moreira, J.: Towards automatic generation of metafeatures. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 215–226. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31753-3_18

    Chapter  Google Scholar 

  14. van der Putten, P., van Someren, M.: A bias-variance analysis of a real world learning problem: the coil challenge 2000. Mach. Learn. 57(1), 177–195 (2004)

    Article  MATH  Google Scholar 

  15. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  16. Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: popular nearest neighbors in high-dimensional data. JMLR 11, 2487–2531 (2010)

    MathSciNet  MATH  Google Scholar 

  17. Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65–118 (1976)

    Article  Google Scholar 

  18. van Rijn, J.N., Abdulrahman, S.M., Brazdil, P., Vanschoren, J.: Fast algorithm selection using learning curves. In: Fromont, E., Bie, T., Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 298–309. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24465-5_26

    Chapter  Google Scholar 

  19. Tsamardinos, I., Aliferis, C.: Towards principled feature selection: relevancy, filters and wrappers. In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (2003)

    Google Scholar 

  20. Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)

    Article  Google Scholar 

  21. Verikas, A., Bacauskiene, M.: Feature selection with neural networks. Pattern Recogn. Lett. 23(11), 1323–1335 (2002)

    Article  MATH  Google Scholar 

  22. Vilalta, R., Giraud-Carrier, C.G., Brazdil, P., Soares, C.: Using meta-learning to support data mining. IJCSA 1(1), 31–45 (2004)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martijn J. Post .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Post, M.J., van der Putten, P., van Rijn, J.N. (2016). Does Feature Selection Improve Classification? A Large Scale Experiment in OpenML. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46349-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46348-3

  • Online ISBN: 978-3-319-46349-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics