Skip to main content

A New Proposed Feature Subset Selection Algorithm Based on Maximization of Gain Ratio

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9498))

Included in the following conference series:

  • 1795 Accesses

Abstract

Feature subset selection is one of the techniques to extract the highly relevant subset of original features from a dataset. In this paper, we have proposed a new algorithm to filter the features from the dataset using a greedy stepwise forward selection technique. The Proposed algorithm uses gain ratio as the greedy evaluation measure. It utilizes multiple feature correlation technique to remove the redundant features from the data set. Experiments that are carried out to evaluate the Proposed algorithm are based on number of features, runtime and classification accuracy of three classifiers namely Naïve Bayes, the Tree based C4.5 and Instant Based IB1. The results have been compared with other two feature selection algorithms, i.e. Fast Correlation-Based Filter Solution (FCBS) and Fast clustering based feature selection algorithm (FAST) over the datasets of different dimensions and domain. A unified metric, which combines all three parameters (number of features, runtime, classification accuracy) together, has also been taken to compare the algorithms. The result shows that our Proposed algorithm has a significant improvement than other feature selection algorithms for large dimensional data while working on a data set of image domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kohavi, R., John, G.H.: Wrapper for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  2. Das, S.: Filter, wrapper and a boosting-based hybrid for feature selection. In: Proceedings of Eighteenth International Conference on Machine Learning, pp. 74–81 (2001)

    Google Scholar 

  3. Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 98–109. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC (2003)

    Google Scholar 

  5. Huang, J., Cai, Y., Xu, X.: A filter approach to feature selection based on mutual information. In: 5th IEEE international Conference (2006)

    Google Scholar 

  6. Andreas, G.K. Janecek, A., Gansterer, W.N., Demel, M.A., Ecker, G.F.: On the relationship between feature selection and classification accuracy. In: JMLR: Workshop and Conference Proceedings, vol. 4, pp. 90–105 (2008)

    Google Scholar 

  7. Song, Q., Ni, J., Wang, G.: A fast clustering based feature subset selection algorithm for high dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)

    Article  Google Scholar 

  8. Hall M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 17th International Conference on Machine Learning, pp. 359–366 (2000)

    Google Scholar 

  9. Hall, M.A.: Correlation based feature selection for machine learning. Thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand (1999)

    Google Scholar 

  10. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)

    Article  Google Scholar 

  11. Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of 10th National Conference Artificial Intelligence, pp. 129–134 (1992)

    Google Scholar 

  12. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of European Conference Machine Learning, pp. 171–182 (1994)

    Google Scholar 

  13. Almuallim H., Dietterich T.G.,: Algorithms for Identifying Relevant Features, Proc. Ninth Canadian Conf. Artificial Intelligence, pp. 38–45 (1992)

    Google Scholar 

  14. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  15. Wang, G., Song, Q., Sun, H., Zhang, X., Xu, B., Zhou, Y.: A feature subset selection algorithm automatic recommendation method. J. Artif. Intell. Res. 47, 1–34 (2013)

    MATH  Google Scholar 

  16. Gray, R.M.: Entropy and Information Theory. Springer, New York (1991)

    Google Scholar 

  17. Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24, 301–302 (2002)

    Article  Google Scholar 

  18. Blake, C., Merz: UCI repository of machine learning databases. http://www.ics.uci.edu

  19. Witten, I.H., Frank, E., Hall, M.A., Mining, D.: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)

    Google Scholar 

  20. Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 1st edn. Wiley-Interscience, New York (2007)

    Book  MATH  Google Scholar 

  21. Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., Liu, H.: Advancing feature selection research. Technical report, Arizona State University (2011)

    Google Scholar 

  22. Laiho, P., Kokko, A., Vanharanta, S., Salovaara, R., Sammalkorpi, H., Jarvinen, H., Mecklin, J.P., Karttunen, T.J., Tuppurainen, K., Davalos, V., Schwartz, S., Arango, D., Makinen, M.J., Aaltonen, L.A.: Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene 26(2), 312–320 (2007)

    Article  Google Scholar 

  23. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Nat. Acad Sci. USA 98(26), 15149–15154 (2001)

    Article  Google Scholar 

  24. Golub, T.R., Slonim, D.R., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arpita Nagpal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nagpal, A., Gaur, D. (2015). A New Proposed Feature Subset Selection Algorithm Based on Maximization of Gain Ratio. In: Kumar, N., Bhatnagar, V. (eds) Big Data Analytics. BDA 2015. Lecture Notes in Computer Science(), vol 9498. Springer, Cham. https://doi.org/10.1007/978-3-319-27057-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27057-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27056-2

  • Online ISBN: 978-3-319-27057-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics