Skip to main content

Tumor Classification from Gene Expression Data: A Coding-Based Multiclass Learning Approach

  • Conference paper
Biological and Medical Data Analysis (ISBMDA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3745))

Included in the following conference series:

  • 1190 Accesses

Abstract

The effectiveness of cancer treatment depends strongly on an accurate diagnosis. In this paper we propose a system for automatic and precise diagnosis of a tumor’s origin based on genetic data. This system is based on a combination of coding theory techniques and machine learning algorithms. In particular, tumor classification is described as a multiclass learning setup, where gene expression values serve the system to distinguish between types of tumors. Since multiclass learning is intrinsically complex, the data is divided into several biclass problems whose results are combined with an error correcting linear block code. The robustness of the prediction is increased as errors of the base binary classifiers are corrected by the linear code. Promising results have been achieved with a best case precision of 72% when the system was tested on real data from cancer patients.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dietterich, T., Bakiri, G.: Error-correcting output codes: A general method for improving multiclass inductive learning programs. In: Proceedings of the 9th National Conference on Artificial Intelligence (AAAI 1991), pp. 572–577. AAAI Press, Menlo Park (1991)

    Google Scholar 

  2. Freund, Y., Schapire, R.R.: Experiments with a new boosting algorithm. InMachine Learning. In: Proceedings of the Thirteenth International Conference onMachine Learning, Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  3. http://www-genome.wi.mit.edu/MPR/GCM.html

  4. Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression. Science 286, 531–537 (1999)

    Article  Google Scholar 

  5. Lin, S., Costello Jr., D.J.: Error Control Coding: Fundamentals and Applications. Prentice-Hall, Englewood Cliffs (1983)

    Google Scholar 

  6. MacKay, D.J.C., Neal, R.M.: Good Codes based on Very Sparse Matrices. In: Cryptography and Coding the IMA Conference (1995)

    Google Scholar 

  7. MacKay, D.J.C., Neal, R.M.: Good Error-Correcting Codes based on Very Sparse Matrices. IEEE transactions on Information Theory (1999)

    Google Scholar 

  8. Mukherjee, S.: Classifying Microarray Data Using Support Vector Machines. In: Berrar, D.P., Dubitzky, W., Granzow, M. (eds.) A Practical Approach to Microarray Data Analysis, pp. 166–185. Kluwer Academic Publishers, Dordrecht (2003)

    Chapter  Google Scholar 

  9. Ramaswamy, S., et al.: Multi-Class Cancer Diagnosis Using Tumor Gene Expression Signatures. PNAS 98, 15149–15154 (2001)

    Article  Google Scholar 

  10. Schölkpof, B., Smola, A.: Learning with Kernels Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  11. Storey, J., Tibshirani, R.: Statistical Significance for Genome-Wide Experiments (2003), http://www-stat.stanford.edu/~tibs/ftp/fdringenomics.pdf

  12. Tapia, E.: New learning models based on recursive error correcting codes, Doctoral Thesis, ETSI de Telecomunicación Universidad Politécnica de Madrid, Spain (2001)

    Google Scholar 

  13. Tapia, E., González, J.C., Hüntemann, A., García-Villalba, J.: Beyond Boosting: Recursive ECOC Learning Machines. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 62–71. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  15. Yeang, C.H., et al.: Molecular classification of multiple tumor types. Bioinformatics 17 (Suppl. 1), 316–322 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hüntemann, A., González, J.C., Tapia, E. (2005). Tumor Classification from Gene Expression Data: A Coding-Based Multiclass Learning Approach. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds) Biological and Medical Data Analysis. ISBMDA 2005. Lecture Notes in Computer Science(), vol 3745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573067_22

Download citation

  • DOI: https://doi.org/10.1007/11573067_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29674-4

  • Online ISBN: 978-3-540-31658-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics