On the Reproducibility of the TAGME Entity Linking System

Hasibi, Faegheh; Balog, Krisztian; Bratsberg, Svein Erik

doi:10.1007/978-3-319-30671-1_32

Faegheh Hasibi²¹,
Krisztian Balog²² &
Svein Erik Bratsberg²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

European Conference on Information Retrieval

4518 Accesses
9 Citations

Abstract

Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Part of the results are reproducible through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://tagme.di.unipi.it/.
2.
https://sites.google.com/site/sigirrigor/.
3.
http://tagme.di.unipi.it/tagme_help.html and is also mentioned in [5, 18].
4.
Personal communication with authors of [3, 8, 11].
5.
http://acube.di.unipi.it/tagme-dataset/.
6.
As explained later by the TAGME authors, they in fact used micro-averaging. This contradicts the referred paper [12], which explicitly defines \(P_{ann}\) and \(R_{ann}\) as being macro-averaged.
7.
It was later explained by the TAGME authors that they actually used only 1.4M out of 2M snippets from Wiki-Disamb30, as Weka could not load more than that into memory. From Wiki-Annot30 they used all snippets, the difference is merely a matter of approximation.
8.
https://archive.org/details/enwiki_20100408.
9.
The proper implementation of link probability would result in lower values (as the denominator would be higher) and would likely require a different threshold value than what is suggested in [8]. This goes beyond the scope of our paper.
10.
http://web-ngram.research.microsoft.com/erd2014/Datasets.aspx.

References

Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J.P., Wang, K.: ERD’14: Entity recognition and disambiguation challenge. SIGIR Forum 48(2), 63–77 (2014)
Article Google Scholar
Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., Trani, S.: Dexter: An open source framework for entity linking. In: Proceedings of the Sixth International Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 17–20 (2013)
Google Scholar
Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., Trani, S.: Learning relatedness measures for entity linking. In: Proceedings of CIKM 2013, pp. 139–148 (2013)
Google Scholar
Chiu, Y.-P., Shih, Y.-S., Lee, Y.-Y., Shao, C.-C., Cai, M.-L., Wei, S.-L., Chen, H.-H.: NTUNLP approaches to recognizing and disambiguating entities in long and short text at the ERD challenge 2014. In: Proceedings of Entity Recognition & Disambiguation Workshop, pp. 3–12 (2014)
Google Scholar
Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: Proceedings of WWW 2013, pp. 249–260 (2013)
Google Scholar
Cornolti, M., Ferragina, P., Ciaramita, M., Schütze, H., Rüd, S.: The SMAPH system for query entity recognition and disambiguation. In: Proceedings of Entity Recognition & Disambiguation Workshop, pp. 25–30 (2014)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of EMNLP-CoNLL 2007, pp. 708–716 (2007)
Google Scholar
Ferragina, P., Scaiella, U.: TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of CIKM 2010, pp. 1625–1628 (2010)
Google Scholar
Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with Wikipedia pages. CoRR (2010). abs/1006.3498
Google Scholar
Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: A graph-based method. In: Proceedings of SIGIR 2011, pp. 765–774 (2011)
Google Scholar
Hasibi, F., Balog, K., Bratsberg, S.E.: Entity linking in queries: tasks and evaluation. In: Proceedings of the ICTIR 2015, pp. 171–180 (2015)
Google Scholar
Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of Wikipedia entities in web text. In: Proceedings of KDD 2009, pp. 457–466 (2009)
Google Scholar
Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with Wikipedia. In: Proceedings of the AAAI WikiAI Workshop, pp. 19–24 (2008)
Google Scholar
Meij, E., Balog, K., Odijk, D.: Entity linking and retrieval for semantic search. In: Proceedings of WSDM 2014, pp. 683–684 (2014)
Google Scholar
Mihalcea, R., Csomai, A.: Wikify!: Linking documents to encyclopedic knowledge. In: Proceedings of CIKM 2007, pp. 233–242 (2007)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of CIKM 2008, pp. 509–518 (2008)
Google Scholar
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceedings of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30 (2008)
Google Scholar
Usbeck, R., Röder, M., Ngonga Ngomo, A.-C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL: General entity annotator benchmarking framework. In: Proceedings of WWW 2015, pp. 1133–1143 (2015)
Google Scholar

Download references

Acknowledgement

We would like to thank Paolo Ferragina and Ugo Scaiella for sharing the TAGME source code with us and for the insightful discussions and clarifications later on. We also thank Diego Ceccarelli for the discussion on link probability computation and for providing help with the Dexter API.

Author information

Authors and Affiliations

Norwegian University of Science and Technology, Trondheim, Norway
Faegheh Hasibi & Svein Erik Bratsberg
University of Stavanger, Stavanger, Norway
Krisztian Balog

Authors

Faegheh Hasibi
View author publications
You can also search for this author in PubMed Google Scholar
Krisztian Balog
View author publications
You can also search for this author in PubMed Google Scholar
Svein Erik Bratsberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faegheh Hasibi .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hasibi, F., Balog, K., Bratsberg, S.E. (2016). On the Reproducibility of the TAGME Entity Linking System. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics