Skip to main content

Robust Cardinality Estimation for Subgraph Isomorphism Queries on Property Graphs

  • Conference paper
  • First Online:
Biomedical Data Management and Graph Online Querying (Big-O(Q) 2015, DMAH 2015)

Abstract

With an increasing popularity of graph data and graph processing systems, the need of efficient graph processing and graph query optimization becomes more important. Subgraph isomorphism queries, one of the fundamental graph query types, rely on an accurate cardinality estimation of a single edge of a pattern for efficient query processing. State of the art approaches do not consider two important aspects for cardinality estimation of graph queries on property graphs: the existence of nodes with a high outdegree and functional dependencies between attributes. In this paper we focus on these two challenges and integrate the detection of high-outdegree nodes and functional dependency analysis into the cardinality estimation. We evaluate our approach on two real data sets and compare it against a state-of-the-art query optimizer for property graphs as implemented in Neo4j.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the latest available version: Neo4j 2.3.0-M2.

References

  1. Boldi, P., Rosa, M., Vigna, S.: HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Proceedings of the WWW 2011, pp. 625–634 (2011)

    Google Scholar 

  2. Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C.: Synopses for massive data: samples, histograms, wavelets, sketches. Found. Trends Databases 4(1–3), 1–294 (2012)

    MATH  Google Scholar 

  3. Iglewicz, B., Hoaglin, D.C.: How to Detect, Handle Outliers. ASQC Quality Press, Milwaukee (1993)

    Google Scholar 

  4. Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison ofsubgraph isomorphism algorithms in graph databases. In: Proceedings of the VLDB Endowment, vol. 6, pp. 133–144 (2012)

    Google Scholar 

  5. Lu, Y., Cheng, J., Yan, D., Wu, H.: Large-scale distributed graph computing systems: An experimental evaluation. Proc. VLDB Endow. 8(3), 281–292 (2014)

    Article  Google Scholar 

  6. Martínez-Bazan, N., Águila Lorente, M.A., Muntés-Mulero, V., Dominguez-Sal, D., Gómez-Villamor, S., Larriba-Pey, J.-L.: Efficient graph management based on bitmap indices. In: Proceedings of the IDEAS 2012, pp. 110–119 (2012)

    Google Scholar 

  7. Neumann, T., Moerkotte, G., Sets, C.: Accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the ICDE 2011, pp. 984–994 (2011)

    Google Scholar 

  8. Neumann, T., Weikum, G.: RDF-3X: A RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)

    Article  Google Scholar 

  9. Palmer, C.R., Gibbons, P.B., Faloutsos, C., ANF: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the SIGKDD 2002, pp. 81–90 (2002)

    Google Scholar 

  10. Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. Proc. VLDB Endow. 8(10), 217–228 (2015)

    Article  Google Scholar 

  11. Paradies, M., Lehner, W., Bornhövd, C.: GRAPHITE: an extensible graph traversal framework for relational database management systems. In: Proceedings of the SSDBM 2015, pp. 29: 1–29: 12 (2015)

    Google Scholar 

  12. Paradies, M., Lemke, C., Plattner, H., Lehner, W., Sattler, K.-U., Zeier, A., Krueger, J.: How to juggle columns: an entropy-based approach for table compression. In: Proceedings of the IDEAS 2010, pp. 205–215 (2010)

    Google Scholar 

  13. Rodriguez, M.A., Neubauer, P.: Constructions from dots and lines. Bull. Am. Soc. Inf. Sci. Technol. 36(6), 35–41 (2010)

    Article  Google Scholar 

  14. Seo, S.: A review and comparison of methods for detecting outliers in univariate data sets. Master’s thesis, Faculty of Graduate School of Public Health, University of Pittsburgh (2006)

    Google Scholar 

  15. Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow. 1(1), 364–375 (2008)

    Article  Google Scholar 

  16. Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the WWW 2008, pp. 595–604 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcus Paradies .

Editor information

Editors and Affiliations

A Evaluated Queries

A Evaluated Queries

Table 1. Query templates used in the evaluation.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Paradies, M., Vasilyeva, E., Mocan, A., Lehner, W. (2016). Robust Cardinality Estimation for Subgraph Isomorphism Queries on Property Graphs. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) DMAH 2015 2015. Lecture Notes in Computer Science(), vol 9579. Springer, Cham. https://doi.org/10.1007/978-3-319-41576-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41576-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41575-8

  • Online ISBN: 978-3-319-41576-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics