Skip to main content
  • 963 Accesses

Abstract

Different algorithms exist to compute the result of a logical operator like AND, OPT, or SORT. A physical operator implements one of the algorithms to compute the result of a logical operator. The different physical operators sometimes have different constraints on the input data like that the input data must be sorted, or are faster than others for special types of input data, for example, when the input data fit into main memory. The context of an operator can be described by the estimations of properties of its input data. For each (logical) operator in the operatorgraph, physical optimization aims to choose the physical operator with the best estimated execution times in the operator’s context.

As well as describing the physical operators, we in this chapter present our new approaches to efficient RDF data management and join optimization for small datasets and for large-scale datasets with over one billion triples.

For small datasets, where the data can be indexed in main memory, in-memory indices can significantly speed up query processing because (after loading the data) no disk accesses need to be done for query processing. B+-trees are optimized for disk indices of large-scale datasets, as they are optimized for blockwise sequential accesses of disks. For main-memory indices, hash indices are preferable as an index access can be done in constant time, as only a hash function must be applied to the key to retrieve the (main memory) address of the indexed element. Therefore, we use hash indices to manage small RDF datasets. Based on the triple nature of RDF data, we create seven hash indices in order to retrieve in-memory RDF data quickly. On the basis of the SPARQL-specific properties and the seven indices, we develop a new, efficient approach to computing join by dynamically restricting triple patterns. A performance evaluation demonstrates that the new approach outperforms other state-of-the-art in-memory databases.

Since the Semantic Web datasets are becoming increasingly large, developing efficient techniques to speeding up querying large-scale Semantic Web data is a key issue for Semantic Web applications. When data are already sorted, from relational database research, merge joins are known to be the fastest join algorithms on large-scale data. Therefore, recent approaches focus on the presorting of Semantic Web data during index construction, and thus the fast merge join can be used without a sorting phase at runtime for some joins. When data for succeeding joins become unsorted, the hash join is typically used. In this chapter, we propose a sorting numbering scheme for large RDF datasets, based on which we can fast sort any intermediate and final querying results. Applying our sorting numbering scheme, all joins can be computed using the merge join with a fast sorting phase. Besides being a significant benefit to merge joins, our fast sorting technique can also remarkably speed up the elimination of duplicates. Our experiments show that a merge join using our fast sorting technique outperforms greatly the hash join and that our sorting numbering scheme integrated into any index approaches significantly speeds up querying large-scale Semantic Web data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning, VLDB, Vienna, Austria (2007)

    Google Scholar 

  • Angles, R., Gutiérrez C.: Querying RDF data from a graph database perspective. In: ESWC (2005)

    Google Scholar 

  • Auer, S., et al.: Dbpedia: a nucleus for a web of open data. In: ISWC/ASWC (2007)

    Google Scholar 

  • Beged-Dove, G., Brickley, D., Dornfest, R., Davis, I., Dodds, L., Eisenzopf, J., Galbraith, D., Guha, R.V., MacLeod, K., Miller, E., Swartz, A., van der Vlist, E.: RDF site summary (RSS) 1.0, http://purl.org/rss/1.0/spec (2001)

  • Bernstein, A., Stocker, M., Kiefer, C.: SPARQL query optimization using selectivity estimation. ISWC (2007)

    Google Scholar 

  • Brickley, D., Miller L.: FOAF vocabulary specification 0.9, http://xmlns.com/foaf/spec (2007)

  • Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: International Semantic Web Conference 2002, Chia, Sardinai, Italy (2002)

    Google Scholar 

  • Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme, VLDB (2005)

    Google Scholar 

  • Elmasri, R., Navathe, S.B.: Fundamentals of database systems, 3rd edn, Addison Wesley (2000)

    Google Scholar 

  • Feigenbaum, L., (ed): DAWG Testcases, http://www.w3.org/2001/sw/DataAccess/tests/r2, 2008.

  • Friend, E.H.: Sorting on electronic computer systems. J ACM 3(3) (1956)

    Google Scholar 

  • Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book. Prentice Hall, Upper Saddle River, NJ (2002)

    Google Scholar 

  • Groppe, S., Groppe, J.: LUPOSDATE demonstration, http://www.ifis.uni-luebeck.de/index.php?id=luposdate-demo (2009)

  • Groppe, J., Groppe, S., Ebers, S., Linnemann, V.: Efficient Processing of SPARQL joins in memory by dynamically restricting triple patterns, ACM SAC, Waikiki Beach, Honolulu, Hawaii, USA (2009)

    Google Scholar 

  • Groppe, J., Groppe, S., Schleifer, A., Linnemann, V.: LuposDate: A semantic web database system. In: 18th ACM conference on information and knowledge management (ACM CIKM 2009), Hong Kong, China (2009)

    Google Scholar 

  • Groppe, S, Groppe, J.: External sorting for index construction of large semantic web databases. In: 25th Symposium On Applied Computing (ACM SAC 2010), Sierre, Switzerland (2010)

    Google Scholar 

  • Guha, R.V.: rdfDB : An RDF database. http://www.guha.com/rdfdb/ (2010)

  • Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics 3(2) (2005)

    Google Scholar 

  • Harris, S., Gibbins, N.: 3store: Efficient bulk RDF storage. In: PSSS. (2003)

    Google Scholar 

  • Harth, A., Decker, S.: Optimized index structure for querying RDF from the web. In: Proceedings of the 3rd Latin American Web Congress (LA-WEB), Buenos Aires, Argentina (2005)

    Google Scholar 

  • Hayes, J., Gutiérrez C.: Bipartite graphs as intermediate model for RDF. In: ISWC, (2004)

    Google Scholar 

  • Kim, Y., Kim, B., Lee, J., Lim, H.: The path index for query processing on RDF and RDF Schema. ICACT. (2005)

    Google Scholar 

  • Knuth, D.E.: Sorting and searching, vol. 3 of The art of computer programming, 2nd edn. Reading, MA: Addison-Wesley (1998)

    Google Scholar 

  • Ley, M.: The DBLP computer science bibliography. http://www.informatik.uni-trier.de/~ley/db/ (2010)

  • Liarou, E., Idreos, S., Koubarakis, M.: Continuous RDF query processing over DHTs. In: ISWC. (2007)

    Google Scholar 

  • Matono, A., Yoshikawa, A.T., Uemura, S.: An indexing scheme for RDF and RDF schema based on Suffix Arrays. SWDB’03 co-located with VLDB 2003, Berlin (2003)

    Google Scholar 

  • Matono, A., Amagasa, T., Yoshikawa, M., Uemura, S.: A path-based relational RDF database. In: ADC, (2005)

    Google Scholar 

  • Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs, SIGMOD (2009)

    Google Scholar 

  • Neumann, T., Weikum, G.: RDF3X: a RISCstyle engine for RDF. In: Proceedings of the 34th International Conference on Very Large Data Bases (VLDB). Auckland, New Zealand (2008)

    Google Scholar 

  • Pan, Z., Heflin, J.: DLDB: Extending relational databases to support Semantic Web queries. In: PSSS. (2003)

    Google Scholar 

  • Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. SIGMOD, (1984)

    Google Scholar 

  • Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF, W3C Recommendation, (2008)

    Google Scholar 

  • Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL performance benchmark, ICDE. Shanghai, China (2009)

    Google Scholar 

  • Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S.R., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: a column-oriented DBMS. In: VLDB, (2005)

    Google Scholar 

  • Swiss Institute of Bioinformatics, uniprot RDF, http://dev.isb-sib.ch/projects/uniprot-rdf/ (2009)

  • van Assem, M., Gangemi, A., Schreiber, G.: RDF/OWL Representation of WordNet, W3C Working Draft, 2006. http://www.w3.org/TR/wordnet-rdf/

  • Volz, R., Oberle, D., Staab, S., Motik, B.: KAON SERVER - A Semantic Web Management System. In: WWW (2003)

    Google Scholar 

  • Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. VLDB (2008)

    Google Scholar 

  • Wilkinson K.: Jena property table implementation. In: SSWS (2006)

    Google Scholar 

  • Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. In: Workshop on Semantic Web and Databases. Berlin, Germany (2003)

    Google Scholar 

  • Wood, D., Gearon, P., Adams, T.: Kowari: A platform for Semantic Web storage and analysis. In: XTech, (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sven Groppe .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Groppe, S. (2011). Physical Optimization. In: Data Management and Query Processing in Semantic Web Databases. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19357-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19357-6_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19356-9

  • Online ISBN: 978-3-642-19357-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics