Skip to main content

Automated Enzyme Classification by Formal Concept Analysis

  • Conference paper
Formal Concept Analysis (ICFCA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8478))

Included in the following conference series:

Abstract

Enzymes are macro-molecules (linear sequences of linked molecules) with a catalytic activity that make them essential for any biochemical reaction. High throughput genomic techniques give access to the sequence of new enzymes found in living organisms. Guessing the enzyme’s functional activity from its sequence is a crucial task that can be approached by comparing the new sequences with those of already known enzymes labeled by a family class. This task is difficult because the activity is based on a combination of small sequence patterns and sequences greatly evolved over time. This paper presents a classifier based on the identification of common subsequence blocks between known and new enzymes and the search of formal concepts built on the cross product of blocks and sequences for each class. Since new enzyme families may emerge, it is important to propose a first classification of enzymes that cannot be assigned to a known family. FCA offers a nice framework to set the task as an optimization problem on the set of concepts. The classifier has been tested with success on a particular set of enzymes present in a large variety of species, the haloacid dehalogenase superfamily.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sillitoe, I., Cuff, A., Dessailly, B., Dawson, N., Furnham, N., Lee, D., Lees, J., Lewis, T., Studer, R., Rentzsch, R., Yeats, C., Thornton, J.M., Orengo, C.A.: New functional families (funfams) in cath to improve the mapping of conserved functional sites to 3d structures. Nucleic Acids Res. 41(D1), D490–D498 (2013)

    Google Scholar 

  2. Fox, N.K., Brenner, S.E., Chandonia, J.M.: SCOPe: Structural Classification of Proteins-extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42(D1), D304–D309 (2014)

    Google Scholar 

  3. Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein α-chain identification. In: HICSS (5), pp. 113–122 (1994)

    Google Scholar 

  4. Peris, P., López, D., Campos, M.: Igtm: An algorithm to predict transmembrane domains and topology in proteins. BMC Bioinformatics 9 (2008)

    Google Scholar 

  5. Kerbellec, G.: Apprentissage d’automates modélisant des familles de séquences protéiques. PhD thesis, Université Rennes 1 (2008)

    Google Scholar 

  6. Lee, B.J., Lee, H.G., Lee, J.Y., Ryu, K.H.: Classification of enzyme function from protein sequence based on feature representation. In: Proc. of the 7th IEEE Int. Conf. on Bioinformatics and Bioengineering, BIBE 2007, pp. 741–747 (October 2007)

    Google Scholar 

  7. Lee, B.J., Lee, H.G., Ryu, K.H.: Design of a novel protein feature and enzyme function classification. In: IEEE 8th Int. Conf. on Computer and Information Technology Workshops, CIT Workshops 2008, pp. 450–455 (July 2008)

    Google Scholar 

  8. Kumar, C., Choudhary, A.: A top-down approach to classify enzyme functional classes and sub-classes using random forest. EURASIP Journal on Bioinformatics and Systems Biology 2012(1), 1 (2012)

    Google Scholar 

  9. Brown, D.P., Krishnamurthy, N., Sjölander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3(8), e160 (2007)

    Google Scholar 

  10. Wang, J., Liang, J., Qian, Y.: Closed-label concept lattice based rule extraction approach. In: Huang, D.-S., Gan, Y., Premaratne, P., Han, K. (eds.) ICIC 2011. LNCS, vol. 6840, pp. 690–698. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Carpineto, C., Romano, G.: Galois: An order-theoretic approach to conceptual clustering. In: Proceedings of the 10th International Conference on Machine Learning (ICML 1990), pp. 33–40 (July 1993)

    Google Scholar 

  12. Sahami, M.: Learning classification rules using lattices. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 343–346. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  13. Ikeda, M., Yamamoto, A.: Classification by Selecting Plausible Formal Concepts in a Concept Lattice. In: Workshop on Formal Concept Analysis meets Information Retrieval (FCAIR 2013), pp. 22–35 (2013)

    Google Scholar 

  14. Mephu Nguifo, E.: Legal-e: une méthode d’apprentissage de concepts à partir d’exemples, basée sur le treillis de galois. In: Actes du 9ème Congrès Recon. des Formes en Intell. Artificielle (RFIA), Paris, vol. 2, pp. 35–46 (January 1994)

    Google Scholar 

  15. Klimushkin, M., Obiedkov, S., Roth, C.: Approaches to the selection of relevant concepts in the case of noisy data. In: Kwuida, L., Sertkaya, B. (eds.) ICFCA 2010. LNCS, vol. 5986, pp. 255–266. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Njiwoua, P.: Améliorer l’apprentissage à partir d’instances grĉce à l’induction de concepts: le système cible. In: Science, H., (ed.): Revue d’ Intelligence Artificielle, vol. 13, pp. 413–440 (1999)

    Google Scholar 

  17. Kovacs, L.: Generating decision tree from lattice for classification. In: 7th International Conference on Applied Informatics, vol. 2, pp. 377–384 (2007)

    Google Scholar 

  18. Sahami, M.: Learning classification rules using lattices. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 343–346. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  19. Xie, Z., Hsu, W., Liu, Z., Lee, M.L.: Concept lattice based composite classifiers for high predictability. J. Exp. Theor. Artif. Intell. 14(2-3), 143–156 (2002)

    Article  MATH  Google Scholar 

  20. Busygin, S., Prokopyev, O., Pardalos, P.M.: Biclustering in data mining. Comput. Oper. Res. 35(9), 2964–2987 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  21. Gaume, B., Navarro, E., Prade, H.: Clustering bipartite graphs in terms of approximate formal concepts and sub-contexts. International Journal of Computational Intelligence Systems 6(6), 1125–1142 (2013)

    Article  Google Scholar 

  22. Navarro, E., Prade, H., Gaume, B.: Clustering sets of objects using concepts-objects bipartite graphs. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds.) SUM 2012. LNCS, vol. 7520, pp. 420–432. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  23. Brewka, G., Eiter, T., Truszczyński, M.: Answer set programming at a glance. Commun. ACM 54(12), 92–103 (2011)

    Article  Google Scholar 

  24. Gebser, M., Kaufmann, B., Schaub, T.: Conflict-driven answer set solving: From theory to practice. Artif. Intell. 187, 52–89 (2012)

    Article  MathSciNet  Google Scholar 

  25. Kuznetsova, E., Proudfoot, M., Gonzalez, C.F., Brown, G., Omelchenko, M.V., Borozan, I., Carmel, L., Wolf, Y.I., Mori, H., Savchenko, A.V., Arrowsmith, C.H., Koonin, E.V., Edwards, A.M., Yakunin, A.F.: Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family. Journal of Biological Chemistry 281(47), 36149–36161 (2006)

    Article  Google Scholar 

  26. Seifried, A., Schultz, J., Gohla, A.: Human HAD phosphatases: structure, mechanism, and roles in health and disease. FEBS Journal 280(2), 549–571 (2013)

    Article  Google Scholar 

  27. Koonin, E.V., Tatusov, R.L.: Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity: Application of an iterative approach to database search. J. Mol. Bio. 244(1), 125–132 (1994)

    Article  Google Scholar 

  28. Burroughs, A.M., Allen, K.N., Dunaway-Mariano, D., Aravind, L.: Evolutionary Genomics of the HAD Superfamily: Understanding the Structural Adaptations and Catalytic Diversity in a Superfamily of Phosphoesterases and Allied Enzymes. Journal of Molecular Biology 361(5), 1003–1034 (2006)

    Article  Google Scholar 

  29. Janssen, D.B.: Biocatalysis by dehalogenating enzymes. Advances in Applied Microbiology, vol. 61, pp. 233–252. Academic Press (2007)

    Google Scholar 

  30. Mark Cock, J., Sterck, L., Rouz, P., Scornet, D., Allen, A., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J., Badger, J.: The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature (7298), 617–621 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Coste, F., Garet, G., Groisillier, A., Nicolas, J., Tonon, T. (2014). Automated Enzyme Classification by Formal Concept Analysis. In: Glodeanu, C.V., Kaytoue, M., Sacarea, C. (eds) Formal Concept Analysis. ICFCA 2014. Lecture Notes in Computer Science(), vol 8478. Springer, Cham. https://doi.org/10.1007/978-3-319-07248-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07248-7_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07247-0

  • Online ISBN: 978-3-319-07248-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics