Efficient Discovery of Embedded Patterns from Large Attributed Trees

Wu, Xiaoying; Theodoratos, Dimitri

doi:10.1007/978-3-319-91458-9_34

Xiaoying Wu²⁴ &
Dimitri Theodoratos²⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10828))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3587 Accesses

Abstract

Discovering informative patterns deeply hidden in large tree datasets is an important research area that has many practical applications. Many modern applications and systems represent, export and exchange data in the form of trees whose nodes are associated with attributes. In this paper, we address the problem of mining frequent embedded attributed patterns from large attributed data trees. Attributed pattern mining requires combining tree mining and itemset mining. This results in exploring a larger pattern search space compared to addressing each problem separately. We first design an interleaved pattern mining approach which extends the equivalence-class based tree pattern enumeration technique with attribute sets enumeration. Further, we propose a novel layered approach to discover all frequent attributed patterns in stages. This approach seamlessly integrates an itemset mining technique with a recent unordered embedded tree pattern mining algorithm to greatly reduce the pattern search space. Our extensive experimental results on real and synthetic large-tree datasets show that the layered approach displays, in most cases, orders of magnitude performance improvements over both the interleaved mining method and the attribute-as-node embedded tree pattern mining method and has good scaleup properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2
Book MATH Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)
Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)
Google Scholar
Chehreghani, M.H., Bruynooghe, M.: Mining rooted ordered trees under subtree homeomorphism. Data Min. Knowl. Discov. 30(5), 1249–1272 (2016)
Article MathSciNet Google Scholar
Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining - an overview. Fundam. Inform. 66(1–2), 161–198 (2005)
MathSciNet MATH Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2), 203–234 (2005)
Article Google Scholar
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7), 517–528 (2014)
Google Scholar
Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comput. 24(2), 340–356 (1995)
Article MathSciNet Google Scholar
Knijf, J.D.: FAT-miner: mining frequent attribute trees. In: SAC, pp. 417–422 (2007)
Google Scholar
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)
Article MathSciNet Google Scholar
Miyoshi, Y., Ozaki, T., Ohkawa, T.: Frequent pattern discovery from a single graph with quantitative itemsets. In: ICDM Workshops, pp. 527–532 (2009)
Google Scholar
Pasquier, C., Flouvat, F., Sanhes, J., Selmaoui-Folcher, N.: Attributed graph mining in the presence of automorphism. Knowl. Inf. Syst. 50(2), 569–584 (2017)
Article Google Scholar
Pasquier, C., Sanhes, J., Flouvat, F., Selmaoui-Folcher, N.: Frequent pattern mining in attributed trees: algorithms and applications. Knowl. Inf. Syst. 46(3), 491–514 (2016)
Article Google Scholar
Weis, M., Naumann, F., Brosy, F.: A duplicate detection benchmark for xml (and relational) data (2006)
Google Scholar
Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Efficient evaluation of generalized path pattern queries on XML data. In: WWW (2008)
Google Scholar
Wu, X., Theodoratos, D.: Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 3–20. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18120-2_1
Chapter Google Scholar
Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)
MathSciNet MATH Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer School, Wuhan University, Wuhan, China
Xiaoying Wu
New Jersey Institute of Technology, Newark, USA
Dimitri Theodoratos

Authors

Xiaoying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Theodoratos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitri Theodoratos .

Editor information

Editors and Affiliations

Simon Fraser University, Burnaby, BC, Canada
Jian Pei
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
University of Queensland, Brisbane, QLD, Australia
Shazia Sadiq
University of Western Australia, Crawley, WA, Australia
Jianxin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Theodoratos, D. (2018). Efficient Discovery of Embedded Patterns from Large Attributed Trees. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10828. Springer, Cham. https://doi.org/10.1007/978-3-319-91458-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-91458-9_34
Published: 12 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91457-2
Online ISBN: 978-3-319-91458-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics