Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Guo, Sheng-Bo; Lyu, Michael R.; Lok, Tat-Ming

doi:10.1007/11816102_49

Sheng-Bo Guo^21,22,
Michael R. Lyu²³ &
Tat-Ming Lok²⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4115))

Included in the following conference series:

International Conference on Intelligent Computing

1428 Accesses
7 Citations

Abstract

With the development of mirocarray technology, microarray data are widely used in the diagnoses of cancer subtypes. However, people are still facing the complicated problem of accurate diagnosis of cancer subtypes. Building classifiers based on the selected key genes from microarray data is a promising approach for the development of microarray technology; yet the selection of non-redundant but relevant genes is complicated. The selected genes should be small enough to allow diagnosis even in regular laboratories and ideally identify genes involved in cancer-specific regulatory pathways. Instead of the traditional gene selection methods used for the classification of two categories of cancers, in the present paper, a novel gene selection algorithm based on mutual information is proposed for the classification of multi-class cancer using microarray data, and the selected key genes are fed into the classifier to classify the cancer subtypes. In our algorithm, mutual information is employed to select key genes related with class distinction. The application on the breast cancer data suggests that the present algorithm can identify the key genes to the BRCA1 mutations/BRCA2 mutations/the sporadic mutations class distinction since the result of our proposed algorithm is promising, because our method can perform the classification of the three types of breast cancer effectively and efficiently. And two more microarray datasets, leukemia and ovarian cancer data, are also employed to validate the performance of our method. The performances of these applications demonstrate the high quality of our method. Based on the present work, our method can be widely used to discriminate different cancer subtypes, which will contribute to the development of technology for the recovery of the cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ben-Dor, A.: Tissue Classification with Gene Expression Profiles. Journal of Computational Biology 7, 559–583 (2000)
Article Google Scholar
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: 2001). In: Advances in Neural Information Processing Systems, vol. 13, MIT Press, Cambridge (2001)
Google Scholar
Xing, E.P., Richard, M.K.: CLIFF: Clustering of High-Dimensional Microarray Data via Iterative Feature Filtering Using Normalized Cuts. Bioinformatics 17(1), 306–315 (2001)
Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. John Wiley and Sons, Inc., Chichester (1991)
Book MATH Google Scholar
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Raffeld, M.: Gene-Expression Profiles in Hereditary Breast Cancer. New Eng. J. Med. 344, 539–548 (2001)
Article Google Scholar
Nathalie, P., Frank, D.S., Johan, A.K.S., Bart, L.R.D.M.: Systematic Benchmarking of Microarray Data Classification: Assessing the Role of Non-linearity and Dimensionality Reduction. Bioinformatics 20(17), 3185–3195 (2004)
Article Google Scholar
Simon, R.: Supervised Analysis when The Number of Candidate Features Greatly Exceeds the Number of the Cases. SIGKDD Explorations 5(2), 31–36 (2003)
Article Google Scholar
Au, W.H., Keith, C.C.C., Andrew, K.C.W., Wang, Y.: Attribute Clustering for Grouping, Selection and Classification of Gene Expression Data. IEEE/ACM Transactions on computational biology and bioinformatics 2(2), 83–101 (2005)
Article Google Scholar
MacKay, D.J.C.: Information Theory, Inference, and Learning Algorithm. Cambridge Univ. Press, Cambridge (2003)
Google Scholar
Chen, D.C., Liu, Z.Q., Ma, X.B., Hua, D.: Selecting Genes by Test Statistics. Journal of Biomedicine and Biotechnology 2, 132–138 (2005)
Article Google Scholar
Brown, M.B., Forsythe, A.B.: The Small Sample Behavior of Some Statistic which Test the Equality of Several Means. Technometrics, 129–132 (1974)
Google Scholar
Welch, B.L.: On the Comparison of Several Mean Values: An Alternative Approach. Biometrika 38, 330–336 (1951)
MATH MathSciNet Google Scholar
Chen, D.C., Hua, D., Jaques, R., Cheng, X.Z.: Gene Selection for Multi-class Prediction of Microarray Data. In: Bioinformatics Conference, 2003, CSB 2003, Proceedings of the 2003 IEEE, pp. 492–495 (2003)
Google Scholar
Liu, L., Andrew, K.C.W., Wang, Y.: A Global Optimal Algorithm for Class-Dependent Discretization of Continuous Data 8(2), 151–170 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, University of Science and Technology of China, Hefei, Anhui, 230026, China
Sheng-Bo Guo
Intelligent Computation Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, P.O. Box 1130, Hefei, Anhui, 230031, China
Sheng-Bo Guo
Computer Science & Engineering Dept., The Chinese University of Hong Kong, Shatin, Hong Kong
Michael R. Lyu
Information Engineering Dept., The Chinese University of Hong Kong, Shatin, Hong Kong
Tat-Ming Lok

Authors

Sheng-Bo Guo
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Tat-Ming Lok
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui, China
De-Shuang Huang
Queen’s University, Belfast, UK
Kang Li & George William Irwin &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, SB., Lyu, M.R., Lok, TM. (2006). Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer. In: Huang, DS., Li, K., Irwin, G.W. (eds) Computational Intelligence and Bioinformatics. ICIC 2006. Lecture Notes in Computer Science(), vol 4115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816102_49

Download citation

DOI: https://doi.org/10.1007/11816102_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37277-6
Online ISBN: 978-3-540-37282-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics