A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Schietgat, Leander; Ramon, Jan; Bruynooghe, Maurice

doi:10.1007/s10472-013-9335-0

A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Published: 12 March 2013

Volume 69, pages 343–376, (2013)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Leander Schietgat¹,
Jan Ramon¹ &
Maurice Bruynooghe¹

419 Accesses
11 Citations
Explore all metrics

Abstract

Metrics for structured data have received an increasing interest in the machine learning community. Graphs provide a natural representation for structured data, but a lot of operations on graphs are computationally intractable. In this article, we present a polynomial-time algorithm that computes a maximum common subgraph of two outerplanar graphs. The algorithm makes use of the block-and-bridge preserving subgraph isomorphism, which has significant efficiency benefits and is also motivated from a chemical perspective. We focus on the application of learning structure-activity relationships, where the task is to predict the chemical activity of molecules. We show how the algorithm can be used to construct a metric for structured data and we evaluate this metric and more generally also the block-and-bridge preserving matching operator on 60 molecular datasets, obtaining state-of-the-art results in terms of predictive performance and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding Largest Common Substructures of Molecules in Quadratic Time

Graph-Based Methods for Rational Drug Design

From Bags to Graphs of Stereo Subgraphs in Order to Predict Molecule’S Properties

References

Akutsu, T.: A polynomial time algorithm for finding a largest common subgraph of almost trees of bounded degree. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E76-A, 1488–1493 (1993)
Google Scholar
Bringmann, B., Zimmermann, A., De Raedt, L., Nijssen, S.: Don’t be afraid of simpler patterns. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 55–66 (2006)
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett. 19(3–4), 255–259 (1998)
Article MATH Google Scholar
Cao, Y., Jiang, T., Girke, T.: A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics 24(13), i366–i374 (2008)
Article Google Scholar
Ceroni, A., Costa, F., Frasconi, P.: Classification of small molecules by two- and three-dimensional decomposition kernels. Bioinformatics 23(16), 2038–2045 (2007)
Article Google Scholar
Chaoji, V., Al Hasan, M., Salem, S., Besson, J., Zaki, M.J.: Origami: A novel and effective approach for mining representative orthogonal graph patterns. Stat. Anal. Data Min. 1(2), 67–84 (2008)
Article MathSciNet Google Scholar
Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining—an overview. Fundam. Inform. 66(1–2), 161–198 (2005)
MATH MathSciNet Google Scholar
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recogn. Artif. Intell. 18(3), 265–298 (2004)
Article Google Scholar
De Raedt, L.: Logical and Relational Learning. Springer (2008)
De Raedt, L., Ramon, J.: Deriving distance metrics from generality relations. Pattern Recogn. Lett. 30(3), 187–191 (2009)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng. 17(8), 1036–1050 (2005)
Article Google Scholar
Diestel, R.: Graph Theory. Springer-Verlag (2000)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman and Co. (1979)
Gärtner, T.: Kernels for Structured Data. World Scientific (2008)
Hansch, C., Maolney, P.P., Fujita, T., Muir, R.M.: Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients. Nature 194, 178–180 (1962)
Article Google Scholar
He, H., Singh, A.K.: Graphrank: statistical modeling and mining of significant subgraphs in the feature space. In: ICDM ’06: Proceedings of the 6th International Conference on Data Mining, pp. 885–890. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Helma, C., Kramer S., De Raedt, L: Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Model. 44(4), 1402–141 (2004)
Article Google Scholar
Hopcroft, J.E., Karp, R.M.: A n ^5/2 algorithm for maximum matching in bipartite graphs. SIAM J. Comput. 2, 225–231 (1973)
Article MATH MathSciNet Google Scholar
Horváth, T., Gärtner, T., Wrobel, S.: Cyclic pattern kernels for predictive graph mining. In: KDD ’04: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 158–167 (2004)
Horváth, T., Ramon, J., Wrobel, S.: Frequent subgraph mining in outerplanar graphs. In: KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 197–206. Philadelphia, PA (2006)
Horváth, T., Ramon, J., Wrobel, S.: Frequent subgraph mining in outerplanar graphs. Data Min. Knowl. Discov. 21(3), 472–508 (2010)
Article MathSciNet Google Scholar
Joachims, T.: Learning to Classify Text using Support Vector Machines: Methods, Theory, and Algorithms. Springer (2002)
Johnson, M.A., Maggiora, G.M.: Concepts and Applications of Molecular Similarity. John Wiley (1990)
Karunaratne, T., Boström, H.: Learning to classify structured data by graph propositionalization. In: Proceedings of the 2nd IASTED International Conference on Computational Intelligence, pp. 393–398 (2006)
King, R.D., Muggleton, S., Srinivasan, A., Sternberg, M.J.E.: Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. Natl. Acad. Sci. 93, 438–442 (1996)
Article Google Scholar
Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)
Article MATH Google Scholar
Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in HIV data. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), pp. 136–143. ACM Press (2001)
Kramer, S., Lavrač, N., Flach, P.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 262–291. Springer-Verlag (2001)
Lingas, A.: Subgraph isomorphism for biconnected outerplanar graphs in cubic time. Theor. Comput. Sci. 63, 295–302 (1989)
Article MATH MathSciNet Google Scholar
Maunz, A., Helma, C., Kramer, S.: Large-scale graph mining using backbone refinement classes. In: KDD ’09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 617–626. ACM, New York, NY (2009)
Chapter Google Scholar
McGregor, J.J.: Backtrack search algorithms and the maximal common subgraph problem. Softw. Pract. Exp. 12, 23–34 (1982)
Article MATH Google Scholar
Mitchell, S.L.: Linear algorithms to recognize outerplanar and maximal outerplanar graphs. Inf. Process. Lett. 9(5), 229–232 (1979)
Article MATH Google Scholar
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
Article MATH MathSciNet Google Scholar
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 647–652 (2004)
Raymond, J., Gardiner, E., Willett, P.: Rascal: calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45, 631–644 (2002)
Article MATH Google Scholar
Raymond, J., Willett, P.: Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput. Aided Mol. Des. 16, 59–71 (2002)
Article Google Scholar
Raymond, J., Willett, P.: Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput. Aided Mol. Des. 16, 521–533 (2002)
Article Google Scholar
Schietgat, L., Ramon, J., Bruynooghe, M., Blockeel, H.: An efficiently computable graph-based metric for the classification of small molecules. In: Proceedings of the 11th International Conference on Discovery Science, vol. 5255 of Lecture Notes in Artificial Intelligence, pp. 197–209 (2008)
Schietgat, L., Costa, F., Ramon, J., De Raedt, L.: Effective feature construction by maximum common subgraph sampling. Mach. Learn. 83(2), 137–161 (2011)
Article MATH MathSciNet Google Scholar
Shamir, R., Tsur, D.: Faster subtree isomorphism. J. Algorithms 33(2), 267–280 (1992)
Article MathSciNet Google Scholar
Shearer, K., Bunke, H., Venkatesh, S.: Video indexing and similarity retrieval by largest common subgraph detection using decision trees. Pattern Recogn. 34(5), 1075–1091 (2001)
Article MATH Google Scholar
Shervashidze, N., Borgwardt, K.: Fast subtree kernels on graphs. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1660–1668 (2009)
Swamidass, S.J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., Baldi, P.: Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics 21, i359–i368 (2005)
Article Google Scholar
Syslo, M.: The subgraph isomorphism problem for outerplanar graphs. Theor. Comp. Sci. 17(1), 91–97 (1982)
Article MATH MathSciNet Google Scholar
Wale, N., Watson, I.A., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl. Inf. Syst. 14, 347–375 (2008)
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Article Google Scholar
Willett, P.: Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 11(23/24), 1046–1051 (2006)
Article Google Scholar
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), pp. 721–724. IEEE Computer Society (2002)

Download references

Author information

Authors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Leander Schietgat, Jan Ramon & Maurice Bruynooghe

Authors

Leander Schietgat
View author publications
You can also search for this author in PubMed Google Scholar
Jan Ramon
View author publications
You can also search for this author in PubMed Google Scholar
Maurice Bruynooghe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leander Schietgat.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schietgat, L., Ramon, J. & Bruynooghe, M. A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics. Ann Math Artif Intell 69, 343–376 (2013). https://doi.org/10.1007/s10472-013-9335-0

Download citation

Published: 12 March 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10472-013-9335-0

Keywords

Mathematics Subject Classifications (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Abstract

Access this article

Similar content being viewed by others

Finding Largest Common Substructures of Molecules in Quadratic Time

Graph-Based Methods for Rational Drug Design

From Bags to Graphs of Stereo Subgraphs in Order to Predict Molecule’S Properties

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classifications (2010)

Navigation

A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Abstract

Access this article

Similar content being viewed by others

Finding Largest Common Substructures of Molecules in Quadratic Time

Graph-Based Methods for Rational Drug Design

From Bags to Graphs of Stereo Subgraphs in Order to Predict Molecule’S Properties

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classifications (2010)

Search

Navigation