Abstract
Enzymes are macro-molecules (linear sequences of linked molecules) with a catalytic activity that make them essential for any biochemical reaction. High throughput genomic techniques give access to the sequence of new enzymes found in living organisms. Guessing the enzyme’s functional activity from its sequence is a crucial task that can be approached by comparing the new sequences with those of already known enzymes labeled by a family class. This task is difficult because the activity is based on a combination of small sequence patterns and sequences greatly evolved over time. This paper presents a classifier based on the identification of common subsequence blocks between known and new enzymes and the search of formal concepts built on the cross product of blocks and sequences for each class. Since new enzyme families may emerge, it is important to propose a first classification of enzymes that cannot be assigned to a known family. FCA offers a nice framework to set the task as an optimization problem on the set of concepts. The classifier has been tested with success on a particular set of enzymes present in a large variety of species, the haloacid dehalogenase superfamily.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sillitoe, I., Cuff, A., Dessailly, B., Dawson, N., Furnham, N., Lee, D., Lees, J., Lewis, T., Studer, R., Rentzsch, R., Yeats, C., Thornton, J.M., Orengo, C.A.: New functional families (funfams) in cath to improve the mapping of conserved functional sites to 3d structures. Nucleic Acids Res. 41(D1), D490–D498 (2013)
Fox, N.K., Brenner, S.E., Chandonia, J.M.: SCOPe: Structural Classification of Proteins-extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42(D1), D304–D309 (2014)
Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein α-chain identification. In: HICSS (5), pp. 113–122 (1994)
Peris, P., López, D., Campos, M.: Igtm: An algorithm to predict transmembrane domains and topology in proteins. BMC Bioinformatics 9 (2008)
Kerbellec, G.: Apprentissage d’automates modélisant des familles de séquences protéiques. PhD thesis, Université Rennes 1 (2008)
Lee, B.J., Lee, H.G., Lee, J.Y., Ryu, K.H.: Classification of enzyme function from protein sequence based on feature representation. In: Proc. of the 7th IEEE Int. Conf. on Bioinformatics and Bioengineering, BIBE 2007, pp. 741–747 (October 2007)
Lee, B.J., Lee, H.G., Ryu, K.H.: Design of a novel protein feature and enzyme function classification. In: IEEE 8th Int. Conf. on Computer and Information Technology Workshops, CIT Workshops 2008, pp. 450–455 (July 2008)
Kumar, C., Choudhary, A.: A top-down approach to classify enzyme functional classes and sub-classes using random forest. EURASIP Journal on Bioinformatics and Systems Biology 2012(1), 1 (2012)
Brown, D.P., Krishnamurthy, N., Sjölander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3(8), e160 (2007)
Wang, J., Liang, J., Qian, Y.: Closed-label concept lattice based rule extraction approach. In: Huang, D.-S., Gan, Y., Premaratne, P., Han, K. (eds.) ICIC 2011. LNCS, vol. 6840, pp. 690–698. Springer, Heidelberg (2012)
Carpineto, C., Romano, G.: Galois: An order-theoretic approach to conceptual clustering. In: Proceedings of the 10th International Conference on Machine Learning (ICML 1990), pp. 33–40 (July 1993)
Sahami, M.: Learning classification rules using lattices. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 343–346. Springer, Heidelberg (1995)
Ikeda, M., Yamamoto, A.: Classification by Selecting Plausible Formal Concepts in a Concept Lattice. In: Workshop on Formal Concept Analysis meets Information Retrieval (FCAIR 2013), pp. 22–35 (2013)
Mephu Nguifo, E.: Legal-e: une méthode d’apprentissage de concepts à partir d’exemples, basée sur le treillis de galois. In: Actes du 9ème Congrès Recon. des Formes en Intell. Artificielle (RFIA), Paris, vol. 2, pp. 35–46 (January 1994)
Klimushkin, M., Obiedkov, S., Roth, C.: Approaches to the selection of relevant concepts in the case of noisy data. In: Kwuida, L., Sertkaya, B. (eds.) ICFCA 2010. LNCS, vol. 5986, pp. 255–266. Springer, Heidelberg (2010)
Njiwoua, P.: Améliorer l’apprentissage à partir d’instances grĉce à l’induction de concepts: le système cible. In: Science, H., (ed.): Revue d’ Intelligence Artificielle, vol. 13, pp. 413–440 (1999)
Kovacs, L.: Generating decision tree from lattice for classification. In: 7th International Conference on Applied Informatics, vol. 2, pp. 377–384 (2007)
Sahami, M.: Learning classification rules using lattices. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 343–346. Springer, Heidelberg (1995)
Xie, Z., Hsu, W., Liu, Z., Lee, M.L.: Concept lattice based composite classifiers for high predictability. J. Exp. Theor. Artif. Intell. 14(2-3), 143–156 (2002)
Busygin, S., Prokopyev, O., Pardalos, P.M.: Biclustering in data mining. Comput. Oper. Res. 35(9), 2964–2987 (2008)
Gaume, B., Navarro, E., Prade, H.: Clustering bipartite graphs in terms of approximate formal concepts and sub-contexts. International Journal of Computational Intelligence Systems 6(6), 1125–1142 (2013)
Navarro, E., Prade, H., Gaume, B.: Clustering sets of objects using concepts-objects bipartite graphs. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds.) SUM 2012. LNCS, vol. 7520, pp. 420–432. Springer, Heidelberg (2012)
Brewka, G., Eiter, T., Truszczyński, M.: Answer set programming at a glance. Commun. ACM 54(12), 92–103 (2011)
Gebser, M., Kaufmann, B., Schaub, T.: Conflict-driven answer set solving: From theory to practice. Artif. Intell. 187, 52–89 (2012)
Kuznetsova, E., Proudfoot, M., Gonzalez, C.F., Brown, G., Omelchenko, M.V., Borozan, I., Carmel, L., Wolf, Y.I., Mori, H., Savchenko, A.V., Arrowsmith, C.H., Koonin, E.V., Edwards, A.M., Yakunin, A.F.: Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family. Journal of Biological Chemistry 281(47), 36149–36161 (2006)
Seifried, A., Schultz, J., Gohla, A.: Human HAD phosphatases: structure, mechanism, and roles in health and disease. FEBS Journal 280(2), 549–571 (2013)
Koonin, E.V., Tatusov, R.L.: Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity: Application of an iterative approach to database search. J. Mol. Bio. 244(1), 125–132 (1994)
Burroughs, A.M., Allen, K.N., Dunaway-Mariano, D., Aravind, L.: Evolutionary Genomics of the HAD Superfamily: Understanding the Structural Adaptations and Catalytic Diversity in a Superfamily of Phosphoesterases and Allied Enzymes. Journal of Molecular Biology 361(5), 1003–1034 (2006)
Janssen, D.B.: Biocatalysis by dehalogenating enzymes. Advances in Applied Microbiology, vol. 61, pp. 233–252. Academic Press (2007)
Mark Cock, J., Sterck, L., Rouz, P., Scornet, D., Allen, A., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J., Badger, J.: The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature (7298), 617–621 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Coste, F., Garet, G., Groisillier, A., Nicolas, J., Tonon, T. (2014). Automated Enzyme Classification by Formal Concept Analysis. In: Glodeanu, C.V., Kaytoue, M., Sacarea, C. (eds) Formal Concept Analysis. ICFCA 2014. Lecture Notes in Computer Science(), vol 8478. Springer, Cham. https://doi.org/10.1007/978-3-319-07248-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-07248-7_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07247-0
Online ISBN: 978-3-319-07248-7
eBook Packages: Computer ScienceComputer Science (R0)