Skip to main content

Efficient Discovery of Embedded Patterns from Large Attributed Trees

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10828))

Included in the following conference series:

  • 3587 Accesses

Abstract

Discovering informative patterns deeply hidden in large tree datasets is an important research area that has many practical applications. Many modern applications and systems represent, export and exchange data in the form of trees whose nodes are associated with attributes. In this paper, we address the problem of mining frequent embedded attributed patterns from large attributed data trees. Attributed pattern mining requires combining tree mining and itemset mining. This results in exploring a larger pattern search space compared to addressing each problem separately. We first design an interleaved pattern mining approach which extends the equivalence-class based tree pattern enumeration technique with attribute sets enumeration. Further, we propose a novel layered approach to discover all frequent attributed patterns in stages. This approach seamlessly integrates an itemset mining technique with a recent unordered embedded tree pattern mining algorithm to greatly reduce the pattern search space. Our extensive experimental results on real and synthetic large-tree datasets show that the layered approach displays, in most cases, orders of magnitude performance improvements over both the interleaved mining method and the attribute-as-node embedded tree pattern mining method and has good scaleup properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cs.rpi.edu/~zaki/software/.

  2. 2.

    http://xml-benchmark.org.

  3. 3.

    http://dblp.uni-trier.de/xml/.

  4. 4.

    https://opennlp.apache.org/.

References

  1. Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2

    Book  MATH  Google Scholar 

  2. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)

    Google Scholar 

  3. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)

    Google Scholar 

  4. Chehreghani, M.H., Bruynooghe, M.: Mining rooted ordered trees under subtree homeomorphism. Data Min. Knowl. Discov. 30(5), 1249–1272 (2016)

    Article  MathSciNet  Google Scholar 

  5. Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining - an overview. Fundam. Inform. 66(1–2), 161–198 (2005)

    MathSciNet  MATH  Google Scholar 

  6. Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2), 203–234 (2005)

    Article  Google Scholar 

  7. Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7), 517–528 (2014)

    Google Scholar 

  8. Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comput. 24(2), 340–356 (1995)

    Article  MathSciNet  Google Scholar 

  9. Knijf, J.D.: FAT-miner: mining frequent attribute trees. In: SAC, pp. 417–422 (2007)

    Google Scholar 

  10. Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)

    Article  MathSciNet  Google Scholar 

  11. Miyoshi, Y., Ozaki, T., Ohkawa, T.: Frequent pattern discovery from a single graph with quantitative itemsets. In: ICDM Workshops, pp. 527–532 (2009)

    Google Scholar 

  12. Pasquier, C., Flouvat, F., Sanhes, J., Selmaoui-Folcher, N.: Attributed graph mining in the presence of automorphism. Knowl. Inf. Syst. 50(2), 569–584 (2017)

    Article  Google Scholar 

  13. Pasquier, C., Sanhes, J., Flouvat, F., Selmaoui-Folcher, N.: Frequent pattern mining in attributed trees: algorithms and applications. Knowl. Inf. Syst. 46(3), 491–514 (2016)

    Article  Google Scholar 

  14. Weis, M., Naumann, F., Brosy, F.: A duplicate detection benchmark for xml (and relational) data (2006)

    Google Scholar 

  15. Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Efficient evaluation of generalized path pattern queries on XML data. In: WWW (2008)

    Google Scholar 

  16. Wu, X., Theodoratos, D.: Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 3–20. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18120-2_1

    Chapter  Google Scholar 

  17. Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)

    MathSciNet  MATH  Google Scholar 

  18. Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitri Theodoratos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, X., Theodoratos, D. (2018). Efficient Discovery of Embedded Patterns from Large Attributed Trees. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10828. Springer, Cham. https://doi.org/10.1007/978-3-319-91458-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91458-9_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91457-2

  • Online ISBN: 978-3-319-91458-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics