Data Mining in Proteomics with Learning Classifier Systems

Bacardit, Jaume; Stout, Michael; Hirst, Jonathan D.; Krasnogor, Natalio

doi:10.1007/978-3-540-78979-6_2

Jaume Bacardit⁵,
Michael Stout⁵,
Jonathan D. Hirst⁶ &
…
Natalio Krasnogor⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 125))

701 Accesses
3 Citations

Summary

The era of data mining has provided renewed effort in the research of certain areas of biology that for their difficulty and lack of knowledge were and are still considered unsolved problems. One such problem, which is one of the fundamental open problems in computational biology is the prediction of the 3D structure of proteins, or protein structure prediction (PSP). The human experts, with the crucial help of data mining tools, are learning how protein fold to form their structure, but are still far from providing perfect models for all kinds of proteins. Data mining and knowledge discovery are totally necessary in order to advance in the understanding of the folding process. In this context, Learning Classifier Systems (LCS) are very competitive tools. They have shown in the past their competence in many different data mining tasks. Moreover, they provide human-readable solutions to the experts that can help them understand the PSP problem. In this chapter we describe our recent efforts in applying LCS to PSP related domains. Specifically, we focus in a relevant PSP subproblem, called Coordination Number (CN) prediction. CN is a kind of simplified profile of the 3D structure of a protein. Two kinds of experiments are described, the first of them analyzing different ways to represent the basic composition of proteins, its primary sequence, and the second one assessing different data sources and problem definition methods for performing competent CN prediction. In all the experiments LCS show their competence in terms of both accurate predictions and explanatory power.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Holland, J.H., Reitman, J.S.: Cognitive systems based on adaptive algorithms. In Hayes-Roth, D., Waterman, F., eds.: Pattern-Directed Inference Systems. Academic, New York (1978) 313–329
Google Scholar
Smith, S.: A Learning System Based on Genetic Algorithms. PhD thesis, University of Pittsburgh, Pittsburgh (1980)
Google Scholar
Bernadó, E., Llorà, X., Garrell, J.M.: XCS and GALE: a comparative study of two learning classifier systems with six other learning algorithms on classification tasks. In: Fourth International Workshop on Learning Classifier Systems – IWLCS-2001. (2001) 337–341
Google Scholar
Bacardit, J., Butz, M.V.: Data mining in learning classifier systems: comparing xcs with gassist. In: Advances at the frontier of Learning Classifier Systems. Springer, Berlin Heidelberg New York (2007) 282–290
Google Scholar
Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3 (1995) 149–175
Article Google Scholar
Llorà, X., Garrell, J.M.: Knowledge-independent data mining with fine-grained parallel evolutionary algorithms. In: Proceedings of the Third Genetic and Evolutionary Computation Conference. Morgan Kaufmann, San Francisco (2001) 461–468
Google Scholar
Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining Era: Representations, Generalization, and Run-Time. PhD thesis, Ramon Llull University, Barcelona, Spain (2004)
Google Scholar
Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N., Blazewicz, J.: From HP lattice models to real proteins: coordination number prediction using learning classifier systems. In: Applications of Evolutionary Computing, EvoWorkshops 2006, Springer, Berlin Heidelberg New York, LNCS 3907 (2006) 208–220
Book Google Scholar
Bacardit, J., Stout, M., Krasnogor, N., Hirst, J.D., Blazewicz, J.: Coordination number prediction using learning classifier systems: performance and interpretability. In: GECCO’06: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation. ACM Press, New York (2006) 247–254
Google Scholar
Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N.: Prediction of residue exposure and contact number for simplified hp lattice model proteins using learning classifier systems. In Ruan, D., D’hondt, P., Fantoni, P.F., Cock, M.D., Nachtegael, M., Kerre, E.E., eds.: Proceedings of the 7th International FLINS Conference on Applied Artificial Intelligence. World Scientific, Genova (2006) 601–608
Google Scholar
Hinds, D.A., Levitt, M.: A lattice model for protein-structure prediction at low resolution. Proceedings of the National Academy Sciences of the United States of America 89 (1992) 2536–2540
Article Google Scholar
Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proceedings of the National Academy Sciences of the United States of America 92 (1995) 325–329
Article Google Scholar
Kinjo, A.R., Horimoto, K., Nishikawa, K.: Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58 (2005) 158–165
Article Google Scholar
Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. Journal of Machine Learning Research 4 (2003) 575–602
Article Google Scholar
Shao, Y., Bystroff, C.: Predicting interresidue contacts using templates and pathways. Proteins 53 (2003) 497–502
Article Google Scholar
MacCallum, R.: Striped sheets and protein contact prediction. Bioinformatics 20 (2004) I224–I231
Article Google Scholar
Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings of the IEEE Symposium on BioInformatics and BioEngineering (2003) 26–36
Google Scholar
Altschul, S.F., Madden, T.L., Scher, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25 (1997) 3389–3402
Article Google Scholar
Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding transition in globular proteins. Part 2. Application to two-dimensional lattice proteins. Biopolymers 20 (1981) 1013–1031
Google Scholar
Hart, W.E., Istrail, S.: Crystallographical universal approximability: a complexity theory of protein folding algorithms on crystal lattices. Technical Report SAND95-1294, Sandia National Labs, Albuquerque (1995)
Google Scholar
Hart, W., Istrail, S.: Robust proofs of NP-hardness for protein folding: general lattices and energy potentials. Journal of Computational Biology (1997) 1–20
Google Scholar
Escuela, G., Ochoa, G., Krasnogor, N.: Evolving l-systems to capture protein structure native conformations. In: Proceedings of the 8th European Conference on Genetic Programming (EuroGP 2005), Lecture Notes in Computer Sciences 3447, pp. 73–84, Springer, Berlin Heidelberg New York (2005)
Google Scholar
Krasnogor, N., Pelta, D.: Fuzzy memes in multimeme algorithms: a fuzzy-evolutionary hybrid. In Verdegay, J., ed.: Fuzzy Sets based Heuristics for Optimization. Springer, Berlin Heidelberg New York (2002)
Google Scholar
Krasnogor, N., Hart, W., Smith, J., Pelta, D.: Protein structure prediction with evolutionary algorithms. In Banzhaf, W., Daida, J., Eiben, A., Garzon, M., Honavar, V., Jakaiela, M., Smith, R., eds.: GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann (1999)
Google Scholar
Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Proceedings of the Parallel Problem Solving from Nature VII. Lecture Notes in Computer Science. Volume 2439 (2002) 769–778
Google Scholar
Krasnogor, N., de la Cananl, E., Pelta, D., Marcos, D., Risi, W.: Encoding and crossover mismatch in a molecular design problem. In Bentley, P., ed.: AID98: Proceedings of the Workshop on Artificial Intelligence in Design 1998 (1998)
Google Scholar
Krasnogor, N., Pelta, D., Marcos, D., Risi, W.: Protein structure prediction as a complex adaptive system. In: Proceedings of Frontiers in Evolutionary Algorithms 1998 (1998)
Google Scholar
DeJong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13 (1993) 161–188
Article Google Scholar
Bacardit, J.: Analysis of the initialization stage of a pittsburgh approach learning classifier system. In: GECCO 2005: Proceedings of the Genetic and Evolutionary Computation Conference. Volume 2., ACM Press, New York (2005) 1843–1850
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14 (1978) 465–471
Article MATH Google Scholar
Bacardit, J., Goldberg, D., Butz, M., Llorà, X., Garrell, J.M.: Speeding-up pittsburgh learning classifier systems: Modeling time and accuracy. In: Parallel Problem Solving from Nature - PPSN 2004, Springer, Berlin Heidelberg New York, LNCS 3242 (2004) 1021–1031
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24 (1996) 123–140
MATH MathSciNet Google Scholar
Bacardit, J., Krasnogor, N.: Empirical evaluation of ensemble techniques for a pittsburgh learning classifier system. In: Proceedings of the 9th International Workshop on Learning Classifier Systems. (to appear), LNAI, Springer (2007)
Google Scholar
Blake, C., Keogh, E., Merz, C.: UCI repository of machine learning databases (1998) (www.ics.uci.edu/mlearn/MLRepository.html)
Liu, H., Hussain, F., Tam, C.L., Dash, M.: Discretization: an enabling technique. Data Mining and Knowledge Discovery 6 (2002) 393–423
Article MathSciNet Google Scholar
Noguchi, T., Matsuda, H., Akiyama, Y.: Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Research 29 (2001) 219–220
Article Google Scholar
Sander, C., Schneider, R.: Database of homology-derived protein structures. Proteins 9 (1991) 56–68
Article Google Scholar
Broome, B., Hecht, M.: Nature disfavors sequences of alternating polar and non-polar amino acids: implications for amyloidogenesis. Journal of Molecular Biology 296 (2000) 961–968
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, San Mateo (1995) 338–345
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (2000)
Google Scholar
Miller, R.G.: Simultaneous Statistical Inference. Springer, Berlin Heidelberg New York (1981)
MATH Google Scholar
Jones, D.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292 (1999) 195–202
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. Department of Computer Science and Information Engineering, National Taiwan University. (2001) Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
Booker, L.: Recombination distribution for genetic algorithms. In: Foundations of Genetic Algorithms 2. Morgan Kaufmann (1993) 29–44
Google Scholar
Livingstone, C.D., Barton, G.J.: Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Computer Applications in the Biosciences 9 (1993) 745–756
Google Scholar
Bacardit, J., Stout, M., Hirst, J.D., Sastry, K., Llorà, X., Krasnogor, N.: Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: GECCO’07: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. ACM Press, New York (2007) to appear
Google Scholar
Harik, G.: Linkage learning via probabilistic modeling in the ecga. Technical Report 99010, Illinois Genetic Algorithms Lab, University of Illinois at Urbana-Champaign (1999)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Book MATH Google Scholar
Bacardit, J., Krasnogor, N.: Biohel: bioinformatics-oriented hierarchical evolutionary learning. Nottingham eprints, University of Nottingham (2006)
Google Scholar
Venturini, G.: Sia: a supervised inductive algorithm with genetic search for learning attributes based concepts. In Brazdil, P.B., ed.: Machine Learning: ECML-93 - Proceedings of the European Conference on Machine Learning. Springer, Berlin Heidelberg New York (1993) 280–296
Google Scholar
Stout, M., Bacardit, J., Hirst, J.D., Smith, R.E., Krasnogor, N.: Prediction of topological contacts in proteins using learning classifier systems. Soft Computing (2007) Special Issue on Evolutionary and Metaheuristic–based Data Mining (EMBDM), to appear
Google Scholar
Preparata, F.P.: Computational geometry : an introduction/Franco P. Preparata, Michael Ian Shamos. Texts and monographs in computer science. Springer (1985)
Google Scholar
Butz, M.V., Lanzi, P.L., Wilson, S.W.: Hyper-ellipsoidal conditions in xcs: rotation, linear approximation, and solution structure. In: GECCO’06: Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM Press, New York (2006) 1457–1464
Google Scholar
O’Hara, T., Bull, L.: Backpropagation in accuracy-based neural learning classifier systems. In: Advances at the frontier of Learning Classifier Systems. Springer, Berlin Heidelberg New York (2007) 25–39
Book Google Scholar

Download references

Author information

Authors and Affiliations

Automated Scheduling, Optimization and Planning research group, School of Computer Science and IT, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK
Jaume Bacardit, Michael Stout & Natalio Krasnogor
School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
Jonathan D. Hirst

Authors

Jaume Bacardit
View author publications
You can also search for this author in PubMed Google Scholar
Michael Stout
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan D. Hirst
View author publications
You can also search for this author in PubMed Google Scholar
Natalio Krasnogor
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of the West of England, Bristol, BS16 1QY, UK
Larry Bull
Enginyeria i Arquitectura La Salle, Universitat Ramon Llull, 08022, Barcelona, Spain
Ester Bernadó-Mansilla
Centre for Clinical Epidemiology and Biostatistics, University of Pennsylvania, Philadelphia, PA, 19104, USA
John Holmes

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bacardit, J., Stout, M., Hirst, J.D., Krasnogor, N. (2008). Data Mining in Proteomics with Learning Classifier Systems. In: Bull, L., Bernadó-Mansilla, E., Holmes, J. (eds) Learning Classifier Systems in Data Mining. Studies in Computational Intelligence, vol 125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78979-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-78979-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78978-9
Online ISBN: 978-3-540-78979-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics