Coordinating Discernibility and Independence Scores of Variables in a 2D Space for Efficient and Accurate Feature Selection

Xie, Juanying; Wang, Mingzhao; Zhou, Ying; Li, Jinyan

doi:10.1007/978-3-319-42297-8_12

Juanying Xie¹⁶,
Mingzhao Wang¹⁶,
Ying Zhou¹⁶ &
…
Jinyan Li¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9773))

Included in the following conference series:

International Conference on Intelligent Computing

3013 Accesses
3 Citations

Abstract

Feature selection is to remove redundant and irrelevant features from original ones of exemplars, so that a sparse and representative feature subset can be detected for building a more efficient and accurate classifier. This paper presents a novel definition for the discernibility and independence scores of a feature, and then constructs a two dimensional (2D) space with the feature’s independence as y-axis and discernibility as x-axis to rank features’ importance. This new method is named FSDI (Feature Selection based on Discernibility and Independence of a feature). The discernibility score of a feature is to measure the distinguishability of the feature to detect instances from different classes. The independence score is to measure the redundancy of a feature. All features are plotted in the 2D space according to their discernibility and independence coordinates. The area of the rectangular corresponding to a feature’s discernibility and independence in the 2D space is used as a criterion to rank the importance of the features. Top-k features with much higher importance than the rest ones are selected to form the sparse and representative feature subset for building an efficient and accurate classifier. Experimental results on 5 classical gene expression datasets demonstrate that our proposed FSDI algorithm can select the gene subset efficiently and has the best performance in classification. Our method provides a good solution to the bottleneck issues related to the high time complexity of the existing gene subset selection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://levis.tongji.edu.cn/gzli/data/mirror-kentridge.html.

References

Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1), 245–271 (1997)
Article MathSciNet MATH Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(2), 185–205 (2005)
Article MathSciNet Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. The University of Waikato (1999)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data mining: concepts and techniques: concepts and techniques. Elsevier (2011)
Google Scholar
Hu, Q., Pedrycz, W., Yu, D., Lang, J.: Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(1), 137–150 (2010)
Article Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)
Article Google Scholar
Kira, K., Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. Paper presented at the AAAI (1992)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Article MATH Google Scholar
Li, Y.-X., Li, J.-G., Ruan, X.-G.: Study of informative gene selection for tissue classification based on tumor gene expression profiles. Chin. J. Comput. Chin. Ed. 29(2), 324 (2006)
MathSciNet Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (1967)
Google Scholar
Mao, Y., Zhou, X., Xia, Z., Yi, Z., Sun, Y.: A survey for study of feature selection. Algorithm 20(2), 211–218 (2007). (in Chinese)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Shah, M., Marchand, M., Corbeil, J.: Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 174–186 (2012)
Article Google Scholar
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)
Article Google Scholar
Wang, R., Tang, K.: Feature Selection for Maximizing the Area Under the ROC Curve, pp. 400–405 (2009)
Google Scholar
Xie, J., Gao, H.: Statistical correlation and k-means based distinguishable gene subset selection algorithms. J. Softw. 9, 013 (2014). (in Chinese)
Google Scholar
Xie, J., Gao, H.: A stable gene subset selection algorithm for cancers. In: Yin, X., Ho, K., Zeng, D., Aickelin, U., Zhou, R., Wang, H. (eds.) HIS 2015. LNCS, vol. 9085, pp. 111–122. Springer, Heidelberg (2015)
Google Scholar
Xie, J., Xie, W.: Several feature selection algorithms based on the discernibility of a feature subset and support vector machines. Chin. J. Comput. Chin. Ed. 37(8), 1704–1718 (2014). (in Chinese)
Google Scholar

Download references

Acknowledgements

We are much obliged to those who share the gene expression datasets with us. This work is supported in part by the National Natural Science Foundation of China under Grant No. 31372250, is also supported by the Key Science and Technology Program of Shaanxi Province of China under Grant No. 2013K12-03-24, and is at the same time supported by the Fundamental Research Funds for the Central Universities under Grant No. GK201503067, and by the Innovation Funds of Graduate Programs at Shaanxi Normal University under Grant No. 2015CXS028.

Author information

Authors and Affiliations

School of Computer Science, Shaanxi Normal University, Xiʼan, 710062, People’s Republic of China
Juanying Xie, Mingzhao Wang & Ying Zhou
Faculty of Engineering and Information Technology, University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia
Jinyan Li

Authors

Juanying Xie
View author publications
You can also search for this author in PubMed Google Scholar
Mingzhao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jinyan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juanying Xie .

Editor information

Editors and Affiliations

Tongji University , Shanghai, China
De-Shuang Huang
Inha University , Incheon, Korea (Republic of)
Kyungsook Han
Liverpool John Moores University , Liverpool, United Kingdom
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, J., Wang, M., Zhou, Y., Li, J. (2016). Coordinating Discernibility and Independence Scores of Variables in a 2D Space for Efficient and Accurate Feature Selection. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-42297-8_12
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics