Abstract
Machine learning methods have of late made significant efforts to solving multidisciplinary problems in the field of cancer classification in microarray gene expression data. These tasks are characterized by a large number of features and a few observations, making the modeling a nontrivial undertaking. In this study, we apply entropic filter methods for gene selection, in combination with several off-the-shelf classifiers. The introduction of bootstrap resampling techniques permits the achievement of more stable performance estimates. Our findings show that the proposed methodology permits a drastic reduction in dimension, offering attractive solutions in terms of both prediction accuracy and number of explanatory genes; a dimensionality reduction technique preserving discrimination capabilities is used for visualization of the selected genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
These figures were obtained in a standard ×86 machine at 2.666 GHz.
References
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences USA 96(12) 6745–6750 (1999)
Amin, K., et al.: Wilms’ tumor 1 susceptibility (wt1) gene products are selectively expressed in malignant mesothelioma. The American Journal of Pathology 146(2) 344–356 (1995)
Duan, K.B., et al.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE/ACM Transactions on Nanobioscience 4(3) 228–234 (2005)
Bu, H.L., et al.: Reducing error of tumor classification by using dimension reduction with feature selection. In: The First International Symposium on Optimization and Systems Biology, Beijing, China, 232–241 (2007)
Cai, R., et al.: An efficient gene selection algorithm based on mutual information. Neurocomputing 72 991–999 (2009)
Chakraborty, S.: Simultaneous cancer classification and gene selection with bayesian nearest neighbor method: An integrated approach. Computational Statistics and Data Analysis 53(4) 1462–1474 (2009)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Proceedings of the European working session on Machine learning, Springer, New York, 164–178 (1991)
Chu, F., Wang, L.: Applications of support vector machines to cancer classification with microarray data. International Journal of Neural Systems 15(6) 475–484 (2005)
Chu, W., et al.: Biomarker discovery in microarray gene expression data with gaussian processes. Bioinformatics 21(16) 3385–3393 (June 2005)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of IEEE Computational Systems Bioinformatics (2003)
Dumont, N., Arteaga, C.: Transforming growth factor-β and breast cancer: Tumor promoting effects of transforming growth factor-β. Breast Cancer Research 2 125–132 (2000)
Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439) 531–537 (October 1999)
Gordon, G.J., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62 4963–4967 (September 2002)
Goutebroze, L., et al.: Cloning and characterization of SCHIP-1, a novel protein interacting specifically with spliced isoforms and naturally occurring mutant NF2 proteins. Molecular and Cellular Biology 20(5) 1699–1712 (2000)
Hedenfalk, I., et al.: Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine 344 539–548 (2001)
Hewett, R., Kijsanayothin, F.: Tumor classification ranking from microarray data. BMC Genomics 9(2) (2008)
Hong, J.H., Cho, S.B.: Cancer classification with incremental gene selection based on DNA microarray data. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics 70–74 (2008)
Hong-Qiang, W., et al.: Extracting gene regulation information for cancer classification. Pattern Recognition 40(12) 3379–3392 (2007)
Jiang, W., et al.: Constructing disease-specific gene networks using pair-wise relevance metric: Application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements. BMC Systems Biology 2 (2008)
Johansson, B., et al.: The prostate. Proteomic comparison of prostate cancer cell lines LNCaP-FGC and LNCaP-r reveals heatshock protein 60 as a marker for prostate malignancy 66(12) 1235–1244 (2006)
Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2) 145–153 (2004)
Lisboa, P., et al.: Cluster based visualisation with scatter matrices. Pattern Recognition Letters 29(13) 1814–1823 (2008)
Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric atributes. In: IEEE 7th International Conference on Tools with Artificial Intelligence, 338–395 (1995)
Lurje, G., et al.: Polymorphisms in VEGF and IL-8 predict tumor recurrence in stage III colon cancer. Annals of Oncology 19 1734–1741 (2008)
Meyer, P.E., Schretter C., Bontempi, G. Information-theoretic feature selection in microarray data using variable complementarity. IEEE Journal of Selected Topics in Signal Processing 2(3) (2008)
National center of biothecnology information. http://www.ncbi.nlm.nih.gov/
Ng, M., Chan, L.: Informative gene discovery for cancer classification from microarray expression data. In: IEEE Machine Learning for Signal Processing, 393–398 (2005)
Öhrvik, A., et al.: Sensitive nonradiometric method for determining thymidine kinase 1 activity. Clinical Chemistry 50(9) 1597–1606 (2004)
Plesa, C., et al.: Prognostic value of immunophenotyping in elderly patients with acute myeloid leukemia: A single-institution experience. Cancer 112(3) 572–580 (2007)
Potamias, G., et al.: Gene selection via discretized gene-expression profiles and greedy feature-elimination. In: SETN, 256–266 (2004)
Ruiz, R., et al.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognition 39 2383–2392 (2006)
Scherz-Shouval, R., et al.: Reactive oxygen species are essential for autophagy and specifically regulate the activity of Atg4. The EMBO Journal 26 1749–1760 (2007)
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203–209 (March 2002)
Tang, Y., et al.: Development of two-stage svm-rfe gene selection strategy for microarray expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3) 365–381 (2007)
Wang, H.: Towards a Unified Framework of Relevance. PhD thesis, University of Ulster (1996)
Wang, L., et al.: Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3) 412–419 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
González-Navarro, F.F., Belanche-Muñoz, L.A. (2011). Parsimonious Selection of Useful Genes in Microarray Gene Expression Data. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_5
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7046-6_5
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)