Semantic Partitioning for RDF Datasets

Troullinou, Georgia; Kondylakis, Haridimos; Plexousakis, Dimitris

doi:10.1007/978-3-319-68282-2_7

Georgia Troullinou¹²,
Haridimos Kondylakis¹² &
Dimitris Plexousakis¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 760))

Included in the following conference series:

International Workshop on Information Search, Integration, and Personalization

266 Accesses
1 Citations

Abstract

Today we are witnessing an explosion in the size and the amount of the available RDF datasets. As such, conventional single node RDF management systems give their position to clustered ones. However most of the currently available clustered RDF database systems partition data using hash functions and/or vertical and horizontal partition algorithms with a significant impact on the number of nodes required for query answering, increasing the total cost of query evaluation. In this paper we present a novel semantic partitioning approach, exploiting both the structure and the semantics of an RDF Dataset, for producing vertical partitions that significantly reduce the number of nodes that should be visited for query answering. To construct these partitions, first we select the most important nodes in a dataset as centroids, using the notion of relevance. Then we use the notion of dependence to assign each remaining node to the appropriate centroid. We evaluate our approach using three real world datasets and demonstrate the nice properties that the constructed partitions possess showing that they significantly reduce the total number of nodes required for query answering while introducing minimal storage overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

References

Aluç, G., Özsu, M.T., Daudjee, K.: Clustering RDF databases using tunable-LSH. CoRR abs/1504.02523 (2015)
Google Scholar
Álvarez-García, S., Brisaboa, N.R., Fernández, J.D., Martínez-Prieto, M.A., Navarro, G.: Compressed vertical partitioning for efficient RDF management. Knowl. Inf. Syst. 44(2), 439–474 (2015)
Article Google Scholar
Bui, T.N., Moon, B.R.: Genetic algorithm and graph partitioning. IEEE Trans. Comput. 45(7), 841–855 (1996)
Article MathSciNet Google Scholar
Fiduccia, C.M., et al.: A linear-time heuristic for improving network partitions. In: DAC (1982)
Google Scholar
Garey, M.R., Johnson, D.S., Stockmeyer, L.J.: Some simplified NP-complete problems. In: STOC (1974)
Google Scholar
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_16
Chapter Google Scholar
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
Johnson, D.S., Aragon, C.R., McGeoch, L.A., et al.: Optimization by simulated annealing: an experimental evaluation. part i, graph partitioning. Oper. Res. 37, 865–892 (1989)
Article Google Scholar
Hendrickson, B., Leland, R.: The chaco user’s guid, version 2.0. Technical report SAND94–2692, Sandia National Laboratories (1995)
Google Scholar
Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)
Article Google Scholar
Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. Statistical Data Analysis based on the L1 Norm (1987)
Google Scholar
Karvounarakis, G., Alexaki, S., Christophides, V., Plexousakis, D., Scholl, M.: RQL: a declarative query language for RDF. In: WWW (2002)
Google Scholar
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359 (1999)
Article MathSciNet Google Scholar
Kellou-Menouer, K., Kedad, Z.: A clustering based approach for type discovery in RDF data sources. In: EGC (2015)
Google Scholar
Kernighan, B., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. J. 49, 291–307 (2013)
Article Google Scholar
Kondylakis, H., Spanakis, M., Sfakianakis, S., et al.: Digital patient: personalized and translational data management through the MyHealthAvatar EU project. In: EMBC (2015)
Google Scholar
Lee, K., Liu, L.: Scaling queries over big RDF graphs with semantic hash partitioning. PVLDB 6(14), 1894–1905 (2013)
Google Scholar
Lee, K., Liu, L., Tang, Y., Zhang, Q., Zhou, Y.: Efficient and customizable data partitioning framework for distributed big RDF data processing in the cloud. In: CLOUD (2013)
Google Scholar
Leng, Y., Chen, Z., Zhong, F., Zhong, H.: BRDPHHC: A Balance RDF data partitioning algorithm based on hybrid hierarchical clustering. In: HPCC/CSS/ICESS (2015)
Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Article Google Scholar
Pappas, A., Troullinou, G., Roussakis, G., Kondylakis, H., Plexousakis, D.: Exploring importance measures for summarizing RDF/S KBs. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 387–403. Springer, Cham (2017). doi:10.1007/978-3-319-58068-5_24
Chapter Google Scholar
Pellegrini, F., Roman, J.: Scotch: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds.) HPCN-Europe 1996. LNCS, vol. 1067, pp. 493–498. Springer, Heidelberg (1996). doi:10.1007/3-540-61142-8_588
Chapter Google Scholar
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. In: PSI EtA, 4 (2010)
Google Scholar
Seddiqui, H., Nath, R.P.D., Aono, M.: An efficient metric of automatic weight generation for properties in instance matching technique. JWS 6(1), 1–17 (2015)
Google Scholar
Schmachtenberg, M., Bizer, C., Paulheim, H.: State of the LOD Cloud. http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/. Accessed 30 Apr 2016
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
The Cancer Genome Atlas project. http://cancergenome.nih.gov/. Accessed 30 Apr 2016
Theodoridou, M., Tzitzikas, Y., Doerr, M., et al.: Modeling and querying provenance by extending CIDOC CRM. Distrib. Parallel Databases 27(2), 169–210 (2010)
Article Google Scholar
Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: SIGMOD (2008)
Google Scholar
Troullinou, G., Kondylakis, H., Daskalaki, E., Plexousakis, D.: RDF digest: efficient summarization of RDF/S KBs. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 119–134. Springer, Cham (2015). doi:10.1007/978-3-319-18818-8_8
Chapter Google Scholar
Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: ICDE (2014)
Google Scholar
Wang, X., Yang, T., Chen, J., He, L., Du, X.: RDF partitioning for scalable SPARQL query processing. Front. Comput. Sci. 9(6), 919–933 (2015)
Article Google Scholar
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: KDD (2007)
Google Scholar
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013)
Google Scholar
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
Google Scholar

Download references

Acknowledgements

This research is implemented through IKY scholarships programme and co-financed by the European Union and Greek national funds through the action entitled “Reinforcement of Postdoctoral Researchers”, in the framework of the Operational Programme “Human Resources Development Program, Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) 2014−2020.

Author information

Authors and Affiliations

Institute of Computer Science, FORTH, N. Plastira 100, Heraklion, Greece
Georgia Troullinou, Haridimos Kondylakis & Dimitris Plexousakis

Authors

Georgia Troullinou
View author publications
You can also search for this author in PubMed Google Scholar
Haridimos Kondylakis
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Plexousakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haridimos Kondylakis .

Editor information

Editors and Affiliations

Saint Lab ETIS, Université de Cergy-Pontoise, Pontoise, France
Dimitris Kotzinos
Laboratoire ETIS, Université de Cergy Pontoise, Cergy Pontoise, France
Dominique Laurent
Dept Informatique INSA Lyon, Université de Lyon, Villeurbanne Cedex, France
Jean-Marc Petit
LRI, University of Paris, Orsay, France
Nicolas Spyratos
Graduate School of Information Science Knowledge Media Laboratory, Hokkaido University, Sapporo, Japan
Yuzuru Tanaka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Troullinou, G., Kondylakis, H., Plexousakis, D. (2017). Semantic Partitioning for RDF Datasets. In: Kotzinos, D., Laurent, D., Petit, JM., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration, and Personlization. ISIP 2016. Communications in Computer and Information Science, vol 760. Springer, Cham. https://doi.org/10.1007/978-3-319-68282-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-68282-2_7
Published: 04 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68281-5
Online ISBN: 978-3-319-68282-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics