Skip to main content

Linking Named Entities in Dutch Historical Newspapers

  • Conference paper
  • First Online:
Metadata and Semantics Research (MTSR 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 672))

Included in the following conference series:

Abstract

We improved access to the collection of Dutch historical newspapers of the Koninklijke Bibliotheek by linking named entities in the newspaper articles to corresponding Wikidata descriptions by means of machine learning techniques and crowdsourcing. Indexing the Wikidata identifiers for named entities together with the newspaper articles opens up new possibilities for retrieving articles that mention these resources and searching the newspaper collection using semantic relations from Wikidata. In this paper we describe our steps so far in setting up this combination of entity linking, machine learning and crowdsourcing in our research environment as well as our planned activities aimed at improving the quality of the links and extending the semantic search capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Semantic Web. http://www.w3.org/standards/semanticweb/

  2. DBpedia. http://dbpedia.org/

  3. Wikidata. https://www.wikidata.org/

  4. VIAF, Virtual International Authority File. http://viaf.org/

  5. Van Veen, T., Lonij, J., Koppelaar, H.: Semantic enrichment: a low-barrier infrastructure and proposal for alignment. D-Lib Mag. (2015). doi:10.1045/july2015-vanveen

    Google Scholar 

  6. Odijk, D., Meij, E., de Rijke, M.: Feeding the second screen: semantic linking based on subtitles. In: Open Research Areas in Information Retrieval (OAIR 2013), Lisbon (2013)

    Google Scholar 

  7. Sil, A., Croning, E., et al.: Linking named entities in any database. In: EMNLP-CoNLL 2012 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea (2012)

    Google Scholar 

  8. Stanford Named Entity Recognizer. http://nlp.stanford.edu/software/CRF-NER.shtml

  9. Apache Solr. http://lucene.apache.org/solr/

  10. SURFsara. https://www.surf.nl/en/services-and-products/hpc-cloud/

  11. mySVM. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/index.html

  12. Wikidata statistics. https://www.wikidata.org/wiki/Wikidata:Statistics

  13. SPARQL, query language for RDF. http://www.w3.org/TR/rdf-sparql-query/

  14. SRU, Search and Retrieval via URL’s. http://www.loc.gov/standards/sru/

  15. KB research portal. http://www.kbresearch.nl/xportal/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theo van Veen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

van Veen, T., Lonij, J., Faber, W.J. (2016). Linking Named Entities in Dutch Historical Newspapers. In: Garoufallou, E., Subirats Coll, I., Stellato, A., Greenberg, J. (eds) Metadata and Semantics Research. MTSR 2016. Communications in Computer and Information Science, vol 672. Springer, Cham. https://doi.org/10.1007/978-3-319-49157-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49157-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49156-1

  • Online ISBN: 978-3-319-49157-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics