Skip to main content

A First Step Towards a Streaming Linked Data Life-Cycle

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2020 (ISWC 2020)

Abstract

Alongside with the ongoing initiative of FAIR data management, the problem of handling Streaming Linked Data (SLD) is relevant as never before. The Web is changing to tame Data Velocity and fulfill the needs of a new generation of Web applications. New protocols (e.g. WebSockets and Server-Sent Events) emerge to grant continuous and reactive data access. Under the Stream Reasoning initiative, the Semantic Web community has been actively working on query languages, engines, and vocabularies to address the scientific and technical challenges of taming Data Velocity without neglecting Data Variety. Nevertheless, a set of guidelines that showcase how to reuse existing resources to produce and consume streams on the Web is still missing. In this paper, we walk through the life-cycle of streaming linked data. We discuss the challenges of applying FAIR principles when publishing data streams. Moreover, we contextualise the usage of prominent Semantic Web resources, i.e., (i) TripleWave, R2RML/RML, VoCaLS, RSP-QL. We apply the guidelines to three representative examples of real-world Web streams: DBpedia Live changes, Wikimedia EventStreams, and the Global Database of Events, Language and Tone (GDELT). Last but not least, we open-sourced our code at https://w3id.org/webstreams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.pubnub.com/learn/glossary/what-is-http-streaming/.

  2. 2.

    https://developer.mozilla.org/en-US/docs/Web/API/EventSource.

  3. 3.

    https://www.w3.org/community/rsp/.

  4. 4.

    https://wiki.dbpedia.org/online-access/DBpediaLive.

  5. 5.

    https://stream.wikimedia.org.

  6. 6.

    https://gdeltproject.org.

  7. 7.

    https://www.w3.org/TR/vocab-dcat/.

  8. 8.

    https://schema.org/DataFeed.

  9. 9.

    https://schema.org/DataFeedItem.

  10. 10.

    https://www.w3.org/TR/ld-bp/.

  11. 11.

    https://www.w3.org/TR/cooluris/#cooluris.

  12. 12.

    I.e., do not let the user understand the underlying infrastructure.

  13. 13.

    https://www.w3.org/2001/sw/rdb2rdf/r2rml/.

  14. 14.

    http://dbpedia-live.openlinksw.com/live/.

  15. 15.

    Link to GDELT-Event_Codebook-V2.0.pdf.

  16. 16.

    Link to GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf.

  17. 17.

    http://motools.sourceforge.net/event/event.122.html.

  18. 18.

    http://linkeddata.stream/ontologies/cameo.owl.

  19. 19.

    http://linkeddata.stream/ontologies/gcam.owl.

  20. 20.

    http://data.gdeltproject.org/documentation/GCAM-MASTER-CODEBOOK.TXT.

  21. 21.

    http://linkeddata.stream/resource/dbl.

  22. 22.

    http://linkeddata.stream/resource/recentchanges.

  23. 23.

    Wikimedia EventStream Terms Of Service.

  24. 24.

    https://cloud.google.com/bigquery/.

References

  1. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets. In: Proceedings of the WWW2009 Workshop on Linked Data on the Web, LDOW 2009, Madrid, Spain, 20 April 2009 (2009)

    Google Scholar 

  2. Angles, R., et al.: The LDBC social network benchmark. CoRR abs/2001.02299 (2020)

    Google Scholar 

  3. Arias-Fisteus, J., García, N.F., Fernández, L.S., Fuentes-Lorenzo, D.: Ztreamy: a middleware for publishing semantic streams on the web. J. Web Semant. 25, (2014)

    Google Scholar 

  4. Balduini, M., Della Valle, E.: FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to support visual analytics. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 321–328. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_21

  5. Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: Querying RDF streams with C-SPARQL. SIGMOD Rec. 39(1), 20–26 (2010)

    Article  Google Scholar 

  6. Barbieri, D.F., Della Valle, E.: A proposal for publishing data streams as linked data - a position paper. In: Proceedings of the WWW 2010 Workshop on Linked Data on the Web, LDOW 2010, Raleigh, USA, 27 April 2010 (2010)

    Google Scholar 

  7. Compton, M., et al.: The SSN ontology of the W3C semantic sensor network incubator group. J. Web Sem. 17, 25–32 (2012)

    Article  Google Scholar 

  8. Consortium, W.W.W., et al.: Best practices for publishing linked data (2014)

    Google Scholar 

  9. Della Valle, E., Balduini, M.: Listening to and visualising the pulse of our cities using social media and call data records. In: Abramowicz, W. (ed.) BIS 2015. LNBIP, vol. 228, pp. 3–14. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26762-3_1

    Chapter  Google Scholar 

  10. Della Valle, E., Dell’Aglio, D., Margara, A.: Taming velocity and variety simultaneously in big data with stream reasoning: tutorial. In: DEBS (2016)

    Google Scholar 

  11. Dell’Aglio, D., Della Valle, E., Calbimonte, J., Corcho, Ó.: RSP-QL semantics: a unifying query model to explain heterogeneity of RDF stream processing systems. Int. J. Seman. Web Inf. Syst. 10(4), 17–44 (2014)

    Article  Google Scholar 

  12. Dell’Aglio, D., Della Valle, E., van Harmelen, F., Bernstein, A.: Stream reasoning: a survey and outlook. Data Sci. 1(1–2), 59–83 (2017)

    Article  Google Scholar 

  13. Dimou, A., et al.: Mapping hierarchical sources into RDF using the RML mapping language. In: 2014 IEEE International Conference on Semantic Computing, Newport Beach, CA, USA, 16–18 June 2014, pp. 151–158 (2014)

    Google Scholar 

  14. Gao, F., Ali, M.I., Mileo, A.: Semantic discovery and integration of urban data streams. In: Proceedings of the Fifth Workshop on Semantics for Smarter Cities a Workshop at the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 19 October 2014, pp. 15–30 (2014)

    Google Scholar 

  15. Gerner, D.J., Schrodt, P.A., Yilmaz, O., Abu-Jabr, R.: Conflict and mediation event observations (cameo): a new event data framework for the analysis of foreign policy interactions. International Studies Association, New Orleans (2002)

    Google Scholar 

  16. Gottschalk, S., Demidova, E.: Eventkg - the hub of event knowledge on the web - and biographical timeline generation. Semantic Web 10(6), 1039–1070 (2019)

    Article  Google Scholar 

  17. Hyland, B., Wood, D.: The joy of data-a cookbook for publishing linked government data on the web. In: Wood, D. (ed.) Linking Government Data, pp. 3–26. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4614-1767-5_1

  18. Luckham, D.: The power of events: an introduction to complex event processing in distributed enterprise systems. In: Bassiliades, N., Governatori, G., Paschke, A. (eds.) RuleML 2008. LNCS, vol. 5321, pp. 3–3. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88808-6_2

    Chapter  Google Scholar 

  19. Margara, A., Urbani, J., van Harmelen, F., Bal, H.E.: Streaming the web: reasoning over dynamic data. J. Web Sem. 25, 24–44 (2014)

    Article  Google Scholar 

  20. Mauri, A., et al.: TripleWave: spreading RDF streams on the Web. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 140–149. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_15

  21. Morsey, M., Lehmann, J., Auer, S., Stadler, C., Hellmann, S.: DBpedia and the live extraction of structured data from wikipedia. Program 46(2), 157–181 (2012)

    Article  Google Scholar 

  22. Passant, A., Bojārs, U., Breslin, J.G., Decker, S.: The SIOC Project: semantically-interlinked online communities, from humans to machines. In: Padget, J., et al. (eds.) COIN -2009. LNCS (LNAI), vol. 6069, pp. 179–194. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14962-7_12

    Chapter  Google Scholar 

  23. Phuoc, D.L., Dao-Tran, M., Tuán, A.L., Duc, M.N., Hauswirth, M.: RDF stream processing with CQELS framework for real-time analysis. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015, Oslo, Norway, 29 June-3 July 2015, pp. 285–292 (2015)

    Google Scholar 

  24. Phuoc, D.L., Nguyen-Mau, H.Q., Parreira, J.X., Hauswirth, M.: A middleware framework for scalable management of linked streams. J. Web Semant. 16, 42–51 (2012)

    Article  Google Scholar 

  25. Sequeda, J.F., Corcho, Ó.: Linked stream data: a position paper. In: Proceedings of the 2nd International Workshop on Semantic Sensor Networks (SSN09), Collocated with the 8th International Semantic Web Conference, Washington DC, USA 2009

    Google Scholar 

  26. Stonebraker, M., Çetintemel, U., Zdonik, S.B.: The 8 requirements of real-time stream processing. SIGMOD Rec. 34(4), 42–47 (2005)

    Article  Google Scholar 

  27. Tommasini, R., Della Valle, E.: Yasper 1.0: towards an RSP-QL engine. In: Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC) (2017)

    Google Scholar 

  28. Tommasini, R., et al.: VoCaLS: vocabulary and catalog of linked streams. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 256–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_16

  29. Villazón-Terrazas, B., Vilches-Blázquez, L.M., Corcho, O., Gómez-Pérez, A.: Methodological guidelines for publishing government linked data. In: Wood, D. (ed.) Linking Government Data, pp. 27–49. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1767-5_2

    Chapter  Google Scholar 

  30. Wilkinson, et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3(1), 160018 (2016). https://doi.org/10.1038/sdata.2016.18

Download references

Acknowledgments

Dr. Tommasini acknowledges support from the European Social Fund via IT Academy program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Tommasini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tommasini, R., Ragab, M., Falcetta, A., Valle, E.D., Sakr, S. (2020). A First Step Towards a Streaming Linked Data Life-Cycle. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62466-8_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62465-1

  • Online ISBN: 978-3-030-62466-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics