Skip to main content

Improving the Stability of Variable Selection for Industrial Datasets

  • Chapter
  • First Online:
Neural Advances in Processing Nonlinear Dynamic Signals (WIRN 2017 2017)

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 102))

Included in the following conference series:

Abstract

Variable reduction is an essential step in data mining, which is able effectively to increase both the performance of machine learning and the process knowledge by removing the redundant and irrelevant input variables. The paper presents a variable selection approach merging the dominating set procedure for redundancy analysis and a wrapper approach in order to achieve an informative and not redundant subset of variables improving both the stability and the computational complexity. The proposed approach is tested on different datasets coming from the UCI repository and from industrial contexts and is compared to the exhaustive variable selection approach, which is often considered optimal in terms of system performance. Moreover the novel method is applied to both classification and regression procedures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Asuncion, A., Newman, D.: UCI machine learning repository (2007). http://archive.ics.uci.edu/ml/datasets.html

  2. Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press (1961)

    Google Scholar 

  3. Biggs, N., Lloyd, E., Wilson, R.: Graph Theory. Oxford University Press (1986)

    Google Scholar 

  4. Bondy, J.A., Murty, U.: Graph Theory. Springer (2008). ISBN 978-1-84628-969-9

    Book  Google Scholar 

  5. Breiman, L., Friedman, J.H., Olshen, R.A., Stone., C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984)

    Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  7. Cateni, S., Colla, V., Vannucci, M., Vannocci, M.: A procedure for building reduced reliable training datasets from realworld data. In: 13th IASTED International Conference on Artificial Intelligence and Applications, AIA 2014, Innsbruck, Austria, pp. 393–399 (2014)

    Google Scholar 

  8. Cateni, S., Colla, V., Vannucci, M.: A fuzzy system for combining filter features selection methods. Int. J. Fuzzy Syst. (2016)

    Google Scholar 

  9. Cateni, S., Colla, V., Vannucci, M.: A hybrid feature selection method for classification purposes. In: 8th European Modeling Symposium on Mathematical Modeling and Computer simulation EMS 2014, Pisa, Italy, vol. 1, pp. 1–8 (2014)

    Google Scholar 

  10. Cateni, S., Colla, V., Vannucci, M.: General purpose input variable extraction: a genetic algorithm based procedure give a gap. In: 9th International Conference on Intelligence Systems Design and Applications, ISDA 2009, pp. 1307–1311 (2009)

    Google Scholar 

  11. Cateni, S., Colla, V., Vannucci, M.: Variable selection through genetic algorithms for classification purpose. In: IASTED International Conference on Artificial Intelligence and Applications, AIA 2010, pp. 6–11 (2010)

    Google Scholar 

  12. Cateni, S., Colla, V.: A hybrid variable selection approach for NN-based classification in industrial context. In: Smart Innovation, Systems and Technologies (in press)

    Google Scholar 

  13. Cateni, S., Colla, V.: Improving the stability of sequential forward and backward variables selection. In: 15th International Conference on Intelligent Systems Design and Applications, ISDA 2015, pp. 374–379 (2016)

    Google Scholar 

  14. Cateni, S., Colla, V.: The importance of variable selection for neural networks based classification in an industrial context. In: International Workshop on Neural Networks, WIRN 2015, Smart Innovation, Systems and Technologies, vol. 54, pp. 363–370 (2016)

    Chapter  Google Scholar 

  15. Cateni, S., Colla, V.: Improving the stability of wrapper variable selection applied to binary classification. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 8, 214–225 (2016)

    Google Scholar 

  16. Cateni, S., Colla, V., Vannucci, M.: A genetic algorithm based approach for selecting input variables and setting relevant network parameters of som based classifier. Int. J. Simul. Syst. Sci. Technol. 12(2), 30–37 (2011)

    Google Scholar 

  17. Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datadata in binary classification tasks for realworld problems. Neurocomputing 135, 32–41 (2014)

    Article  Google Scholar 

  18. Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)

    Google Scholar 

  19. Fiasché, M.: A quantum-inspired evolutionary algorithm for optimization numerical problems. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Part 3. LNCS, vol. 7665, pp. 686–693 (2012)

    Chapter  Google Scholar 

  20. Fiasché, M.: SVM tree for personalized transductive learning in bioinformatics classification problems. Smart Innov. Syst. Technol. 26, 223–231 (2014)

    Article  Google Scholar 

  21. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Mach. Learn. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  22. Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)

    Article  Google Scholar 

  23. Kohavi, R., John, G.: Wrappers for feature selection. Artif. Intell. 97, 273–324 (1997)

    Article  Google Scholar 

  24. Loscalzo, S., Yu, L., Ding, C.: Consensus group stable feature selection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 1, pp. 567–575. ACM (2009)

    Google Scholar 

  25. May, R., Dandy, G., Maier, H.: Review of input variable selection methods for artificial neural networks. Artif. Neural Netw. Methodol. Adv. Biomed. Appl. (2011)

    Google Scholar 

  26. Mitchell, T., Toby, J., Beauchamp, J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–32 (1988)

    Article  MathSciNet  Google Scholar 

  27. Novovicova, J., Somol, P., Pudil, P.: A new measure of feature selection algorithms stability. In: IEEE International Conference Data Mining Workshops, vol. 1, pp. 382–387 (2009)

    Google Scholar 

  28. Sun, Y., Robinson, M., Adams, R., Boekhorst, R., Rust, A.G., Davey, N.: Using feature selection filtering methods for binding site predictions. In: Proceedings of 5th IEEE International Conference on Cognitive Informatics (ICCI 2006) (2006)

    Google Scholar 

  29. Turney, P.: Techncal note: bias and the quantification of stability. Mach. Learn. 20, 23–33 (1995)

    Google Scholar 

  30. Wang, S., Zhu, J.: Variable selection for model-based high dimensional clustering and its application on microarray data. Biometrics 64, 440–448 (2008)

    Article  MathSciNet  Google Scholar 

  31. Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, ICML, vol. 1, pp. 856–863 (2003)

    Google Scholar 

Download references

Acknowledgements

The work presented in this paper was developed within the project entitled “Piattaforma Integrata Avanzata per la Progettazione di Macchine e Sistemi Complessi” (PROMAS), which was co-funded under Tuscany POR FESR 2014–2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvia Cateni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cateni, S., Colla, V., Iannino, V. (2019). Improving the Stability of Variable Selection for Industrial Datasets. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Advances in Processing Nonlinear Dynamic Signals. WIRN 2017 2017. Smart Innovation, Systems and Technologies, vol 102. Springer, Cham. https://doi.org/10.1007/978-3-319-95098-3_19

Download citation

Publish with us

Policies and ethics