Skip to main content

Data Mining in Proteomics with Learning Classifier Systems

  • Chapter
Learning Classifier Systems in Data Mining

Part of the book series: Studies in Computational Intelligence ((SCI,volume 125))

Summary

The era of data mining has provided renewed effort in the research of certain areas of biology that for their difficulty and lack of knowledge were and are still considered unsolved problems. One such problem, which is one of the fundamental open problems in computational biology is the prediction of the 3D structure of proteins, or protein structure prediction (PSP). The human experts, with the crucial help of data mining tools, are learning how protein fold to form their structure, but are still far from providing perfect models for all kinds of proteins. Data mining and knowledge discovery are totally necessary in order to advance in the understanding of the folding process. In this context, Learning Classifier Systems (LCS) are very competitive tools. They have shown in the past their competence in many different data mining tasks. Moreover, they provide human-readable solutions to the experts that can help them understand the PSP problem. In this chapter we describe our recent efforts in applying LCS to PSP related domains. Specifically, we focus in a relevant PSP subproblem, called Coordination Number (CN) prediction. CN is a kind of simplified profile of the 3D structure of a protein. Two kinds of experiments are described, the first of them analyzing different ways to represent the basic composition of proteins, its primary sequence, and the second one assessing different data sources and problem definition methods for performing competent CN prediction. In all the experiments LCS show their competence in terms of both accurate predictions and explanatory power.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Holland, J.H., Reitman, J.S.: Cognitive systems based on adaptive algorithms. In Hayes-Roth, D., Waterman, F., eds.: Pattern-Directed Inference Systems. Academic, New York (1978) 313–329

    Google Scholar 

  2. Smith, S.: A Learning System Based on Genetic Algorithms. PhD thesis, University of Pittsburgh, Pittsburgh (1980)

    Google Scholar 

  3. Bernadó, E., Llorà, X., Garrell, J.M.: XCS and GALE: a comparative study of two learning classifier systems with six other learning algorithms on classification tasks. In: Fourth International Workshop on Learning Classifier Systems – IWLCS-2001. (2001) 337–341

    Google Scholar 

  4. Bacardit, J., Butz, M.V.: Data mining in learning classifier systems: comparing xcs with gassist. In: Advances at the frontier of Learning Classifier Systems. Springer, Berlin Heidelberg New York (2007) 282–290

    Google Scholar 

  5. Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3 (1995) 149–175

    Article  Google Scholar 

  6. Llorà, X., Garrell, J.M.: Knowledge-independent data mining with fine-grained parallel evolutionary algorithms. In: Proceedings of the Third Genetic and Evolutionary Computation Conference. Morgan Kaufmann, San Francisco (2001) 461–468

    Google Scholar 

  7. Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining Era: Representations, Generalization, and Run-Time. PhD thesis, Ramon Llull University, Barcelona, Spain (2004)

    Google Scholar 

  8. Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N., Blazewicz, J.: From HP lattice models to real proteins: coordination number prediction using learning classifier systems. In: Applications of Evolutionary Computing, EvoWorkshops 2006, Springer, Berlin Heidelberg New York, LNCS 3907 (2006) 208–220

    Book  Google Scholar 

  9. Bacardit, J., Stout, M., Krasnogor, N., Hirst, J.D., Blazewicz, J.: Coordination number prediction using learning classifier systems: performance and interpretability. In: GECCO’06: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation. ACM Press, New York (2006) 247–254

    Google Scholar 

  10. Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N.: Prediction of residue exposure and contact number for simplified hp lattice model proteins using learning classifier systems. In Ruan, D., D’hondt, P., Fantoni, P.F., Cock, M.D., Nachtegael, M., Kerre, E.E., eds.: Proceedings of the 7th International FLINS Conference on Applied Artificial Intelligence. World Scientific, Genova (2006) 601–608

    Google Scholar 

  11. Hinds, D.A., Levitt, M.: A lattice model for protein-structure prediction at low resolution. Proceedings of the National Academy Sciences of the United States of America 89 (1992) 2536–2540

    Article  Google Scholar 

  12. Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proceedings of the National Academy Sciences of the United States of America 92 (1995) 325–329

    Article  Google Scholar 

  13. Kinjo, A.R., Horimoto, K., Nishikawa, K.: Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58 (2005) 158–165

    Article  Google Scholar 

  14. Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. Journal of Machine Learning Research 4 (2003) 575–602

    Article  Google Scholar 

  15. Shao, Y., Bystroff, C.: Predicting interresidue contacts using templates and pathways. Proteins 53 (2003) 497–502

    Article  Google Scholar 

  16. MacCallum, R.: Striped sheets and protein contact prediction. Bioinformatics 20 (2004) I224–I231

    Article  Google Scholar 

  17. Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings of the IEEE Symposium on BioInformatics and BioEngineering (2003) 26–36

    Google Scholar 

  18. Altschul, S.F., Madden, T.L., Scher, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25 (1997) 3389–3402

    Article  Google Scholar 

  19. Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding transition in globular proteins. Part 2. Application to two-dimensional lattice proteins. Biopolymers 20 (1981) 1013–1031

    Google Scholar 

  20. Hart, W.E., Istrail, S.: Crystallographical universal approximability: a complexity theory of protein folding algorithms on crystal lattices. Technical Report SAND95-1294, Sandia National Labs, Albuquerque (1995)

    Google Scholar 

  21. Hart, W., Istrail, S.: Robust proofs of NP-hardness for protein folding: general lattices and energy potentials. Journal of Computational Biology (1997) 1–20

    Google Scholar 

  22. Escuela, G., Ochoa, G., Krasnogor, N.: Evolving l-systems to capture protein structure native conformations. In: Proceedings of the 8th European Conference on Genetic Programming (EuroGP 2005), Lecture Notes in Computer Sciences 3447, pp. 73–84, Springer, Berlin Heidelberg New York (2005)

    Google Scholar 

  23. Krasnogor, N., Pelta, D.: Fuzzy memes in multimeme algorithms: a fuzzy-evolutionary hybrid. In Verdegay, J., ed.: Fuzzy Sets based Heuristics for Optimization. Springer, Berlin Heidelberg New York (2002)

    Google Scholar 

  24. Krasnogor, N., Hart, W., Smith, J., Pelta, D.: Protein structure prediction with evolutionary algorithms. In Banzhaf, W., Daida, J., Eiben, A., Garzon, M., Honavar, V., Jakaiela, M., Smith, R., eds.: GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann (1999)

    Google Scholar 

  25. Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Proceedings of the Parallel Problem Solving from Nature VII. Lecture Notes in Computer Science. Volume 2439 (2002) 769–778

    Google Scholar 

  26. Krasnogor, N., de la Cananl, E., Pelta, D., Marcos, D., Risi, W.: Encoding and crossover mismatch in a molecular design problem. In Bentley, P., ed.: AID98: Proceedings of the Workshop on Artificial Intelligence in Design 1998 (1998)

    Google Scholar 

  27. Krasnogor, N., Pelta, D., Marcos, D., Risi, W.: Protein structure prediction as a complex adaptive system. In: Proceedings of Frontiers in Evolutionary Algorithms 1998 (1998)

    Google Scholar 

  28. DeJong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13 (1993) 161–188

    Article  Google Scholar 

  29. Bacardit, J.: Analysis of the initialization stage of a pittsburgh approach learning classifier system. In: GECCO 2005: Proceedings of the Genetic and Evolutionary Computation Conference. Volume 2., ACM Press, New York (2005) 1843–1850

    Google Scholar 

  30. Rissanen, J.: Modeling by shortest data description. Automatica 14 (1978) 465–471

    Article  MATH  Google Scholar 

  31. Bacardit, J., Goldberg, D., Butz, M., Llorà, X., Garrell, J.M.: Speeding-up pittsburgh learning classifier systems: Modeling time and accuracy. In: Parallel Problem Solving from Nature - PPSN 2004, Springer, Berlin Heidelberg New York, LNCS 3242 (2004) 1021–1031

    Google Scholar 

  32. Breiman, L.: Bagging predictors. Machine Learning 24 (1996) 123–140

    MATH  MathSciNet  Google Scholar 

  33. Bacardit, J., Krasnogor, N.: Empirical evaluation of ensemble techniques for a pittsburgh learning classifier system. In: Proceedings of the 9th International Workshop on Learning Classifier Systems. (to appear), LNAI, Springer (2007)

    Google Scholar 

  34. Blake, C., Keogh, E., Merz, C.: UCI repository of machine learning databases (1998) (www.ics.uci.edu/mlearn/MLRepository.html)

  35. Liu, H., Hussain, F., Tam, C.L., Dash, M.: Discretization: an enabling technique. Data Mining and Knowledge Discovery 6 (2002) 393–423

    Article  MathSciNet  Google Scholar 

  36. Noguchi, T., Matsuda, H., Akiyama, Y.: Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Research 29 (2001) 219–220

    Article  Google Scholar 

  37. Sander, C., Schneider, R.: Database of homology-derived protein structures. Proteins 9 (1991) 56–68

    Article  Google Scholar 

  38. Broome, B., Hecht, M.: Nature disfavors sequences of alternating polar and non-polar amino acids: implications for amyloidogenesis. Journal of Molecular Biology 296 (2000) 961–968

    Article  Google Scholar 

  39. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)

    Google Scholar 

  40. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, San Mateo (1995) 338–345

    Google Scholar 

  41. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (2000)

    Google Scholar 

  42. Miller, R.G.: Simultaneous Statistical Inference. Springer, Berlin Heidelberg New York (1981)

    MATH  Google Scholar 

  43. Jones, D.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292 (1999) 195–202

    Article  Google Scholar 

  44. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. Department of Computer Science and Information Engineering, National Taiwan University. (2001) Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.

  45. Booker, L.: Recombination distribution for genetic algorithms. In: Foundations of Genetic Algorithms 2. Morgan Kaufmann (1993) 29–44

    Google Scholar 

  46. Livingstone, C.D., Barton, G.J.: Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Computer Applications in the Biosciences 9 (1993) 745–756

    Google Scholar 

  47. Bacardit, J., Stout, M., Hirst, J.D., Sastry, K., Llorà, X., Krasnogor, N.: Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: GECCO’07: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. ACM Press, New York (2007) to appear

    Google Scholar 

  48. Harik, G.: Linkage learning via probabilistic modeling in the ecga. Technical Report 99010, Illinois Genetic Algorithms Lab, University of Illinois at Urbana-Champaign (1999)

    Google Scholar 

  49. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  50. Bacardit, J., Krasnogor, N.: Biohel: bioinformatics-oriented hierarchical evolutionary learning. Nottingham eprints, University of Nottingham (2006)

    Google Scholar 

  51. Venturini, G.: Sia: a supervised inductive algorithm with genetic search for learning attributes based concepts. In Brazdil, P.B., ed.: Machine Learning: ECML-93 - Proceedings of the European Conference on Machine Learning. Springer, Berlin Heidelberg New York (1993) 280–296

    Google Scholar 

  52. Stout, M., Bacardit, J., Hirst, J.D., Smith, R.E., Krasnogor, N.: Prediction of topological contacts in proteins using learning classifier systems. Soft Computing (2007) Special Issue on Evolutionary and Metaheuristic–based Data Mining (EMBDM), to appear

    Google Scholar 

  53. Preparata, F.P.: Computational geometry : an introduction/Franco P. Preparata, Michael Ian Shamos. Texts and monographs in computer science. Springer (1985)

    Google Scholar 

  54. Butz, M.V., Lanzi, P.L., Wilson, S.W.: Hyper-ellipsoidal conditions in xcs: rotation, linear approximation, and solution structure. In: GECCO’06: Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM Press, New York (2006) 1457–1464

    Google Scholar 

  55. O’Hara, T., Bull, L.: Backpropagation in accuracy-based neural learning classifier systems. In: Advances at the frontier of Learning Classifier Systems. Springer, Berlin Heidelberg New York (2007) 25–39

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bacardit, J., Stout, M., Hirst, J.D., Krasnogor, N. (2008). Data Mining in Proteomics with Learning Classifier Systems. In: Bull, L., Bernadó-Mansilla, E., Holmes, J. (eds) Learning Classifier Systems in Data Mining. Studies in Computational Intelligence, vol 125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78979-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78979-6_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78978-9

  • Online ISBN: 978-3-540-78979-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics