Physical Optimization

Groppe, Sven

doi:10.1007/978-3-642-19357-6_6

Sven Groppe²

963 Accesses

Abstract

Different algorithms exist to compute the result of a logical operator like AND, OPT, or SORT. A physical operator implements one of the algorithms to compute the result of a logical operator. The different physical operators sometimes have different constraints on the input data like that the input data must be sorted, or are faster than others for special types of input data, for example, when the input data fit into main memory. The context of an operator can be described by the estimations of properties of its input data. For each (logical) operator in the operatorgraph, physical optimization aims to choose the physical operator with the best estimated execution times in the operator’s context.

As well as describing the physical operators, we in this chapter present our new approaches to efficient RDF data management and join optimization for small datasets and for large-scale datasets with over one billion triples.

For small datasets, where the data can be indexed in main memory, in-memory indices can significantly speed up query processing because (after loading the data) no disk accesses need to be done for query processing. B⁺-trees are optimized for disk indices of large-scale datasets, as they are optimized for blockwise sequential accesses of disks. For main-memory indices, hash indices are preferable as an index access can be done in constant time, as only a hash function must be applied to the key to retrieve the (main memory) address of the indexed element. Therefore, we use hash indices to manage small RDF datasets. Based on the triple nature of RDF data, we create seven hash indices in order to retrieve in-memory RDF data quickly. On the basis of the SPARQL-specific properties and the seven indices, we develop a new, efficient approach to computing join by dynamically restricting triple patterns. A performance evaluation demonstrates that the new approach outperforms other state-of-the-art in-memory databases.

Since the Semantic Web datasets are becoming increasingly large, developing efficient techniques to speeding up querying large-scale Semantic Web data is a key issue for Semantic Web applications. When data are already sorted, from relational database research, merge joins are known to be the fastest join algorithms on large-scale data. Therefore, recent approaches focus on the presorting of Semantic Web data during index construction, and thus the fast merge join can be used without a sorting phase at runtime for some joins. When data for succeeding joins become unsorted, the hash join is typically used. In this chapter, we propose a sorting numbering scheme for large RDF datasets, based on which we can fast sort any intermediate and final querying results. Applying our sorting numbering scheme, all joins can be computed using the merge join with a fast sorting phase. Besides being a significant benefit to merge joins, our fast sorting technique can also remarkably speed up the elimination of duplicates. Our experiments show that a merge join using our fast sorting technique outperforms greatly the hash join and that our sorting numbering scheme integrated into any index approaches significantly speeds up querying large-scale Semantic Web data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning, VLDB, Vienna, Austria (2007)
Google Scholar
Angles, R., Gutiérrez C.: Querying RDF data from a graph database perspective. In: ESWC (2005)
Google Scholar
Auer, S., et al.: Dbpedia: a nucleus for a web of open data. In: ISWC/ASWC (2007)
Google Scholar
Beged-Dove, G., Brickley, D., Dornfest, R., Davis, I., Dodds, L., Eisenzopf, J., Galbraith, D., Guha, R.V., MacLeod, K., Miller, E., Swartz, A., van der Vlist, E.: RDF site summary (RSS) 1.0, http://purl.org/rss/1.0/spec (2001)
Bernstein, A., Stocker, M., Kiefer, C.: SPARQL query optimization using selectivity estimation. ISWC (2007)
Google Scholar
Brickley, D., Miller L.: FOAF vocabulary specification 0.9, http://xmlns.com/foaf/spec (2007)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: International Semantic Web Conference 2002, Chia, Sardinai, Italy (2002)
Google Scholar
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme, VLDB (2005)
Google Scholar
Elmasri, R., Navathe, S.B.: Fundamentals of database systems, 3rd edn, Addison Wesley (2000)
Google Scholar
Feigenbaum, L., (ed): DAWG Testcases, http://www.w3.org/2001/sw/DataAccess/tests/r2, 2008.
Friend, E.H.: Sorting on electronic computer systems. J ACM 3(3) (1956)
Google Scholar
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book. Prentice Hall, Upper Saddle River, NJ (2002)
Google Scholar
Groppe, S., Groppe, J.: LUPOSDATE demonstration, http://www.ifis.uni-luebeck.de/index.php?id=luposdate-demo (2009)
Groppe, J., Groppe, S., Ebers, S., Linnemann, V.: Efficient Processing of SPARQL joins in memory by dynamically restricting triple patterns, ACM SAC, Waikiki Beach, Honolulu, Hawaii, USA (2009)
Google Scholar
Groppe, J., Groppe, S., Schleifer, A., Linnemann, V.: LuposDate: A semantic web database system. In: 18th ACM conference on information and knowledge management (ACM CIKM 2009), Hong Kong, China (2009)
Google Scholar
Groppe, S, Groppe, J.: External sorting for index construction of large semantic web databases. In: 25th Symposium On Applied Computing (ACM SAC 2010), Sierre, Switzerland (2010)
Google Scholar
Guha, R.V.: rdfDB : An RDF database. http://www.guha.com/rdfdb/ (2010)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics 3(2) (2005)
Google Scholar
Harris, S., Gibbins, N.: 3store: Efficient bulk RDF storage. In: PSSS. (2003)
Google Scholar
Harth, A., Decker, S.: Optimized index structure for querying RDF from the web. In: Proceedings of the 3rd Latin American Web Congress (LA-WEB), Buenos Aires, Argentina (2005)
Google Scholar
Hayes, J., Gutiérrez C.: Bipartite graphs as intermediate model for RDF. In: ISWC, (2004)
Google Scholar
Kim, Y., Kim, B., Lee, J., Lim, H.: The path index for query processing on RDF and RDF Schema. ICACT. (2005)
Google Scholar
Knuth, D.E.: Sorting and searching, vol. 3 of The art of computer programming, 2nd edn. Reading, MA: Addison-Wesley (1998)
Google Scholar
Ley, M.: The DBLP computer science bibliography. http://www.informatik.uni-trier.de/~ley/db/ (2010)
Liarou, E., Idreos, S., Koubarakis, M.: Continuous RDF query processing over DHTs. In: ISWC. (2007)
Google Scholar
Matono, A., Yoshikawa, A.T., Uemura, S.: An indexing scheme for RDF and RDF schema based on Suffix Arrays. SWDB’03 co-located with VLDB 2003, Berlin (2003)
Google Scholar
Matono, A., Amagasa, T., Yoshikawa, M., Uemura, S.: A path-based relational RDF database. In: ADC, (2005)
Google Scholar
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs, SIGMOD (2009)
Google Scholar
Neumann, T., Weikum, G.: RDF3X: a RISCstyle engine for RDF. In: Proceedings of the 34th International Conference on Very Large Data Bases (VLDB). Auckland, New Zealand (2008)
Google Scholar
Pan, Z., Heflin, J.: DLDB: Extending relational databases to support Semantic Web queries. In: PSSS. (2003)
Google Scholar
Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. SIGMOD, (1984)
Google Scholar
Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF, W3C Recommendation, (2008)
Google Scholar
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP²Bench: A SPARQL performance benchmark, ICDE. Shanghai, China (2009)
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S.R., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: a column-oriented DBMS. In: VLDB, (2005)
Google Scholar
Swiss Institute of Bioinformatics, uniprot RDF, http://dev.isb-sib.ch/projects/uniprot-rdf/ (2009)
van Assem, M., Gangemi, A., Schreiber, G.: RDF/OWL Representation of WordNet, W3C Working Draft, 2006. http://www.w3.org/TR/wordnet-rdf/
Volz, R., Oberle, D., Staab, S., Motik, B.: KAON SERVER - A Semantic Web Management System. In: WWW (2003)
Google Scholar
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. VLDB (2008)
Google Scholar
Wilkinson K.: Jena property table implementation. In: SSWS (2006)
Google Scholar
Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. In: Workshop on Semantic Web and Databases. Berlin, Germany (2003)
Google Scholar
Wood, D., Gearon, P., Adams, T.: Kowari: A platform for Semantic Web storage and analysis. In: XTech, (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Systems, University of Lübeck, Ratzeburger Allee 160 (Building 64 - 2nd level), 23562, Lübeck, Germany
Sven Groppe

Authors

Sven Groppe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sven Groppe .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Groppe, S. (2011). Physical Optimization. In: Data Management and Query Processing in Semantic Web Databases. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19357-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-19357-6_6
Published: 25 March 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19356-9
Online ISBN: 978-3-642-19357-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics