Skip to main content

Parsimonious Selection of Useful Genes in Microarray Gene Expression Data

  • Chapter
  • First Online:
Software Tools and Algorithms for Biological Systems

Abstract

Machine learning methods have of late made significant efforts to solving multidisciplinary problems in the field of cancer classification in microarray gene expression data. These tasks are characterized by a large number of features and a few observations, making the modeling a nontrivial undertaking. In this study, we apply entropic filter methods for gene selection, in combination with several off-the-shelf classifiers. The introduction of bootstrap resampling techniques permits the achievement of more stable performance estimates. Our findings show that the proposed methodology permits a drastic reduction in dimension, offering attractive solutions in terms of both prediction accuracy and number of explanatory genes; a dimensionality reduction technique preserving discrimination capabilities is used for visualization of the selected genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These figures were obtained in a standard ×86 machine at 2.666 GHz.

References

  1. Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences USA 96(12) 6745–6750 (1999)

    Article  CAS  Google Scholar 

  2. Amin, K., et al.: Wilms’ tumor 1 susceptibility (wt1) gene products are selectively expressed in malignant mesothelioma. The American Journal of Pathology 146(2) 344–356 (1995)

    PubMed  CAS  Google Scholar 

  3. Duan, K.B., et al.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE/ACM Transactions on Nanobioscience 4(3) 228–234 (2005)

    Article  Google Scholar 

  4. Bu, H.L., et al.: Reducing error of tumor classification by using dimension reduction with feature selection. In: The First International Symposium on Optimization and Systems Biology, Beijing, China, 232–241 (2007)

    Google Scholar 

  5. Cai, R., et al.: An efficient gene selection algorithm based on mutual information. Neurocomputing 72 991–999 (2009)

    Article  Google Scholar 

  6. Chakraborty, S.: Simultaneous cancer classification and gene selection with bayesian nearest neighbor method: An integrated approach. Computational Statistics and Data Analysis 53(4) 1462–1474 (2009)

    Article  Google Scholar 

  7. Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Proceedings of the European working session on Machine learning, Springer, New York, 164–178 (1991)

    Google Scholar 

  8. Chu, F., Wang, L.: Applications of support vector machines to cancer classification with microarray data. International Journal of Neural Systems 15(6) 475–484 (2005)

    Article  PubMed  Google Scholar 

  9. Chu, W., et al.: Biomarker discovery in microarray gene expression data with gaussian processes. Bioinformatics 21(16) 3385–3393 (June 2005)

    Article  PubMed  CAS  Google Scholar 

  10. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of IEEE Computational Systems Bioinformatics (2003)

    Google Scholar 

  11. Dumont, N., Arteaga, C.: Transforming growth factor-β and breast cancer: Tumor promoting effects of transforming growth factor-β. Breast Cancer Research 2 125–132 (2000)

    Article  PubMed  CAS  Google Scholar 

  12. Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439) 531–537 (October 1999)

    Article  PubMed  CAS  Google Scholar 

  13. Gordon, G.J., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62 4963–4967 (September 2002)

    PubMed  CAS  Google Scholar 

  14. Goutebroze, L., et al.: Cloning and characterization of SCHIP-1, a novel protein interacting specifically with spliced isoforms and naturally occurring mutant NF2 proteins. Molecular and Cellular Biology 20(5) 1699–1712 (2000)

    Article  PubMed  CAS  Google Scholar 

  15. Hedenfalk, I., et al.: Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine 344 539–548 (2001)

    Article  PubMed  CAS  Google Scholar 

  16. Hewett, R., Kijsanayothin, F.: Tumor classification ranking from microarray data. BMC Genomics 9(2) (2008)

    Google Scholar 

  17. Hong, J.H., Cho, S.B.: Cancer classification with incremental gene selection based on DNA microarray data. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics 70–74 (2008)

    Google Scholar 

  18. Hong-Qiang, W., et al.: Extracting gene regulation information for cancer classification. Pattern Recognition 40(12) 3379–3392 (2007)

    Article  Google Scholar 

  19. Jiang, W., et al.: Constructing disease-specific gene networks using pair-wise relevance metric: Application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements. BMC Systems Biology 2 (2008)

    Google Scholar 

  20. Johansson, B., et al.: The prostate. Proteomic comparison of prostate cancer cell lines LNCaP-FGC and LNCaP-r reveals heatshock protein 60 as a marker for prostate malignancy 66(12) 1235–1244 (2006)

    Google Scholar 

  21. Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2) 145–153 (2004)

    Article  Google Scholar 

  22. Lisboa, P., et al.: Cluster based visualisation with scatter matrices. Pattern Recognition Letters 29(13) 1814–1823 (2008)

    Article  Google Scholar 

  23. Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric atributes. In: IEEE 7th International Conference on Tools with Artificial Intelligence, 338–395 (1995)

    Google Scholar 

  24. Lurje, G., et al.: Polymorphisms in VEGF and IL-8 predict tumor recurrence in stage III colon cancer. Annals of Oncology 19 1734–1741 (2008)

    Article  PubMed  CAS  Google Scholar 

  25. Meyer, P.E., Schretter C., Bontempi, G. Information-theoretic feature selection in microarray data using variable complementarity. IEEE Journal of Selected Topics in Signal Processing 2(3) (2008)

    Google Scholar 

  26. National center of biothecnology information. http://www.ncbi.nlm.nih.gov/

  27. Ng, M., Chan, L.: Informative gene discovery for cancer classification from microarray expression data. In: IEEE Machine Learning for Signal Processing, 393–398 (2005)

    Google Scholar 

  28. Öhrvik, A., et al.: Sensitive nonradiometric method for determining thymidine kinase 1 activity. Clinical Chemistry 50(9) 1597–1606 (2004)

    Article  PubMed  Google Scholar 

  29. Plesa, C., et al.: Prognostic value of immunophenotyping in elderly patients with acute myeloid leukemia: A single-institution experience. Cancer 112(3) 572–580 (2007)

    Article  Google Scholar 

  30. Potamias, G., et al.: Gene selection via discretized gene-expression profiles and greedy feature-elimination. In: SETN, 256–266 (2004)

    Google Scholar 

  31. Ruiz, R., et al.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognition 39 2383–2392 (2006)

    Article  Google Scholar 

  32. Scherz-Shouval, R., et al.: Reactive oxygen species are essential for autophagy and specifically regulate the activity of Atg4. The EMBO Journal 26 1749–1760 (2007)

    Article  PubMed  CAS  Google Scholar 

  33. Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203–209 (March 2002)

    Article  PubMed  CAS  Google Scholar 

  34. Tang, Y., et al.: Development of two-stage svm-rfe gene selection strategy for microarray expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3) 365–381 (2007)

    Article  PubMed  CAS  Google Scholar 

  35. Wang, H.: Towards a Unified Framework of Relevance. PhD thesis, University of Ulster (1996)

    Google Scholar 

  36. Wang, L., et al.: Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3) 412–419 (2008)

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

González-Navarro, F.F., Belanche-Muñoz, L.A. (2011). Parsimonious Selection of Useful Genes in Microarray Gene Expression Data. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_5

Download citation

Publish with us

Policies and ethics