Skip to main content

Controlling Costs in Feature Selection: Information Theoretic Approach

  • Conference paper
  • First Online:
Computational Science – ICCS 2021 (ICCS 2021)

Abstract

Feature selection in supervised classification is a crucial task in many biomedical applications. Most of the existing approaches assume that all features have the same cost. However, in many medical applications, this assumption may be inappropriate, as the acquisition of the value of some features can be costly. For example, in a medical diagnosis, each diagnostic value extracted by a clinical test is associated with its own cost. Costs can also refer to non-financial aspects, for example, the decision between an invasive exploratory surgery and a simple blood test. In such cases, the goal is to select a subset of features associated with the class variable (e.g., the occurrence of disease) within the assumed user-specified budget. We consider a general information theoretic framework that allows controlling the costs of features. The proposed criterion consists of two components: the first one describes the feature relevance and the second one is a penalty for its cost. We introduce a cost factor that controls the trade-off between these two components. We propose a procedure in which the optimal value of the cost factor is chosen in a data-driven way. The experiments on artificial and real medical datasets indicate that, when the budget is limited, the proposed approach is superior to existing traditional feature selection methods. The proposed framework has been implemented in an open source library (Python package: https://github.com/kaketo/bcselector).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Battiti, R.: Using mutual information for selecting features in supervised neural-net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)

    Article  Google Scholar 

  2. Bolón-Canedo, V., Porto-Díaz, I., Sánchez-Maroño, N., Alonso-Betanzos, A.: A framework for cost-based feature selection. Pattern Recogn. 47(7), 2481–2489 (2014)

    Google Scholar 

  3. Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13(1), 27–66 (2012)

    MathSciNet  MATH  Google Scholar 

  4. Cover, T.M., Thomas, J.A.: Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, Hoboken (2006)

    Google Scholar 

  5. Davis, J.V., Ha, J., Rossbach, C.J., Ramadan, H.E., Witchel, E.: Cost-sensitive decision tree learning for forensic classification. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 622–629. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_60

    Chapter  Google Scholar 

  6. Hall, E.J., Brenner, D.J.: Cancer risks from diagnostic radiology. Br. J. Radiol. 81(965), 362–378 (2008)

    Article  Google Scholar 

  7. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Series in Statistics, Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  8. Jagdhuber, R., Lang, M., Stenzl, A., Neuhaus, J., Rahnenfuhrer, J.: Cost-constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms. BMC Bioinform. 21(2), 307–333 (2020)

    Google Scholar 

  9. Lagasse, R.S.: Anesthesia safety: model or myth?: A review of the published literature and analysis of current original data. Anesthesiol. J. Am. Soc. Anesthesiol. 97(6), 1609–1617 (2002)

    Google Scholar 

  10. Lazecka, M., Mielniczuk, J.: Analysis of information-based nonparametric variable selection criteria. Entropy 22(9), 974 (2020)

    Article  MathSciNet  Google Scholar 

  11. Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of the Workshop on Speech and Natural Language, HLT 1991, pp. 212–217. Association for Computational Linguistics (1992)

    Google Scholar 

  12. Lin, D., Tang, X.: Conditional infomax learning: an integrated framework for feature extraction and fusion. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 68–82. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_6

    Chapter  Google Scholar 

  13. Mielniczuk, J., Teisseyre, P.: Stopping rules for mutual information-based feature selection. Neurocomputing 358, 255–271 (2019)

    Article  Google Scholar 

  14. Paninski, L.: Estimation of entropy and mutual information. Neural Comput. 15(6), 1191–1253 (2003)

    Article  Google Scholar 

  15. Pawluk, M., Teisseyre, P., Mielniczuk, J.: Information-theoretic feature selection using high-order interactions. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds.) LOD 2018. LNCS, vol. 11331, pp. 51–63. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13709-0_5

    Chapter  Google Scholar 

  16. Saeed, M., et al.: Multiparameter intelligent monitoring in intensive care II: a public-access intensive care unit database. Crit. Care Med. 39(5), 952–960 (2011)

    Article  Google Scholar 

  17. Teisseyre, P., Zufferey, D., Słomka, M.: Cost-sensitive classifier chains: selecting low-cost features in multi-label classification. Pattern Recogn. 86, 290–319 (2019)

    Article  Google Scholar 

  18. Turney, P.D.: Types of cost in inductive concept learning. In: Proceedings of the 17th International Conference on Machine Learning, ICML 2002, pp. 1–7 (2002)

    Google Scholar 

  19. Vergara, J.R., Estévez, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)

    Article  Google Scholar 

  20. Vinh, N., Zhou, S., Chan, J., Bailey, J.: Can high-order dependencies improve mutual information based feature selection? Pattern Recogn. 53, 46–58 (2016)

    Article  Google Scholar 

  21. Xu, Z.E., Kusner, M.J., Weinberger, K.Q., Chen, M., Chapelle, O.: Classifier cascades and trees for minimizing feature evaluation cost. J. Mach. Learn. Res. 15(1), 2113–2144 (2014)

    MathSciNet  MATH  Google Scholar 

  22. Yang, H.H., Moody, J.: Data visualization and feature selection: new algorithms for non Gaussian data. Adv. Neural. Inf. Process. Syst. 12, 687–693 (1999)

    Google Scholar 

  23. Zhou, Q., Zhou, H., Li, T.: Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl. Based Syst. 95, 1–11 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paweł Teisseyre .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Teisseyre, P., Klonecki, T. (2021). Controlling Costs in Feature Selection: Information Theoretic Approach. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12743. Springer, Cham. https://doi.org/10.1007/978-3-030-77964-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77964-1_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77963-4

  • Online ISBN: 978-3-030-77964-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics