Edinburgh Associative Thesaurus as RDF and DBpedia Mapping

Hees, Jörn; Bauer, Rouven; Folz, Joachim; Borth, Damian; Dengel, Andreas

doi:10.1007/978-3-319-47602-5_4

Jörn Hees^19,20,
Rouven Bauer^19,20,
Joachim Folz^19,20,
Damian Borth^19,20 &
…
Andreas Dengel^19,20

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9989))

Included in the following conference series:

European Semantic Web Conference

1667 Accesses
1 Citations

An erratum to this publication is available online at https://doi.org/10.1007/978-3-319-47602-5_55

Abstract

Associations, which are one of the key ingredients of human intelligence and thinking, are not easily accessible to the Semantic Web community. High quality RDF datasets of this kind are missing. In this paper we generate such a dataset by transforming 788 K free-text associations of the Edinburgh Associative Thesaurus (EAT) into RDF. Furthermore, we provide a verified mapping of strong textual associations from EAT to DBpedia Entities with the help of a semi-automatic mapping approach. Both generated datasets are made publicly available and can be used as a benchmark for cross-type link prediction and pattern learning.

The original version of this chapter was revised: Two percentages in the listing on page 19 contained a mistake which has been corrected. The erratum to this chapter is available at https://doi.org/10.1007/978-3-319-47602-5_55

You have full access to this open access chapter, Download conference paper PDF

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information

Towards Enriching DBpedia from Vertical Enumerative Structures Using a Distant Learning Approach

1 Introduction

Associations as one of the building blocks of human intelligence, thinking, context forming and everyday communication [4] are not well represented in currently published Linked Data datasets. This impedes AI research: due to the missing ground truth of semantic entities which are associated by humans, we can neither analyse human associations in existing datasets, nor train machines to learn graph patterns for them.

2 Related Work

Previously, we developed semantic games with a purpose to collect a semantic association ground truth (Linked Data Games [7], KnowledgeTestGame [5]) or to rank existing triples by association strengths (BetterRelations [6]). Along the lines of fact ranking ground truth datasets, other works such as WhoKnows [9] and more recently FRanCo [2] have been published. While fact ranking in general only focuses on existing facts, FRanCo in its first step also collected free-text fact input about the entity in question, resulting in \(\sim 7.8\) K raw free-text facts and a NER mapping back to semantic entities^{Footnote 1}. While these works can help collecting new associations, the datasets generated in this paper are orders of magnitude larger, published in RDF, and each of their mappings has been manually verified in order to provide high precision ground truth for machine learning.

3 Edinburgh Associative Thesaurus as RDF

EAT [8] was created in the 1970s and is a dataset of single free-text associations collected directly from humans. It consists of a well connected network of \(\sim 788\) K raw associations which form \(\sim 326\) K unique associations (unique stimulus-response-pairs) between 8200 unique stimuli and \(\sim 22700\) unique responses.

About 5000 unique associations occur more than 20 times (167 K raw associations). In the remainder of this paper we will refer to them as strong associations. An example for such a strong association is the one between stimulus “dog” and response “cat” which occurred 57 out of 100 times.

As the EAT dataset^{Footnote 2} is not available as RDF, we create an association vocabulary^{Footnote 3} and use it to transform EAT into RDF (see example in Fig. 1). We formally model EAT as a multi-set of raw associations. Each raw association \(a \in EAT\) is a free-text stimulus-response-pair: \(a = (s, r), s \in S, r \in R\). The union of all stimuli S and responses R forms the set of terms \(T = S \cup R\). Further, we can define the count \(c_{s,r}\) as the number of occurrences of the raw association and the relative frequency \(f_{s,r}\) as the relative count of response r with respect to a fixed stimulus s over all responses to that stimulus. The resulting transformation of EAT into RDF consists of \(1\,674\,376\) triples^{Footnote 4}.

4 Mapping EAT to DBpedia

This section describes the process of mapping associations from EAT to equivalent semantic associations between pairs of DBpedia [1] entities. If we find such two entities, we call the relation between them a semantic association.

For example, let’s focus on the association “pupil - eye”, with URI eat:stimulus=pupil&response=eye in Fig. 1. We can identify two DBpedia entities, namely dbpedia:Pupil and dbpedia:Eye with the intended meaning of the association and create a new semantic association dbpam:pupil/eye with the corresponding links.

For the mapping we focused on the \(\sim 5000\) unique strong associations occurring more than 20 times (167 K raw associations), as they are more robust with respect to subjectivity, location and time dependency.

Based on experiences gained from a manual mapping of a random sample^{Footnote 5}, we were able to develop an automatic mapping approach with the following scoring component (non-exclusive likelihoods and examples in brackets) which uses the Wikipedia API^{Footnote 6}:

Composite phrases (22 %, e.g., “port - wine”): As a composite phrase is a name for a single semantic entity it is a bad candidate for a semantic association (between two different semantic entities). Hence, if searching for Wikipedia articles (or redirect pages) containing stimulus and response in their title is successful, the mapping’s score receives a strong punishment.
Adjectives & verbs vs. nouns (22 %, e.g., “unbound - free”): Due to Wikipedia’s nature of being an encyclopaedia, adjectives and verbs are under-represented in contrast to nouns. To identify such cases, the stimulus and response are searched in Wordnet [3], potentially resulting in multiple synset candidates for each. Mappings containing only synset candidates with the given type “noun” are preferred. The more synset candidates with types unequal to “noun” are found, the stronger the punishment for the mapping’s score.
Reflexive mappings/synonyms (18 %, e.g., “children - kids”): If the mapping of both the stimulus and the response result in the same semantic entity, the score is strongly punished.
Plural words (20 %, e.g., “thumbs - fingers”): A simple stemming approach is used to compare the stimulus/response to the identified Wikipedia article titles after following redirects. If the match is close to perfect and only differs in singular/plural, the score only receives a slight punishment.
Disambiguation pages (16 %, e.g., “pod - pea”): If the mappings of stimulus or response result in a Wikipedia disambiguation page, the mapping’s score receives a strong punishment.

After applying the automatic mapping to the \(\sim 5000\) strong associations, the top scoring 1066 semantic association candidates (corresponding to \(\sim 34.2\) K raw associations) were selected for human verification.

In order to quickly verify the 1066 mapping candidates, a small web application was used, which shows the textual association from EAT on top (stimulus - response) and the abstracts of both mapped Wikipedia articles below and asks the user if both stimulus and response are correctly mapped.

The web application was used by 10 reviewers and allowed the verification (3 independent “Yes” ratings) of 790 of 1066 mappings (corresponding to \(\sim 25.5\) K raw associations).

For each of the 790 verified mapped semantic associations a mapping URI is created analogously to Fig. 1. The resulting mapping dataset consisting of 4740 triples can be downloaded^{Footnote 7} or simply dereferenced.

5 Conclusion and Outlook

In this paper we presented a transformation of 788 K free-text associations from the Edinburgh Associative Thesaurus into a RDF dataset. Further, we presented a first mapping of its strong associations to semantic associations between DBpedia entities, resulting in 790 manually verified mappings corresponding to \(\sim 25.5\) K raw associations.

In the future we plan to conduct pattern learning based on the mapped semantic associations. As all generated datasets are publicly available, we also look forward to them being used as benchmark or ground truth datasets, for example for link prediction tasks.

This work was financed by the University of Kaiserslautern PhD scholarship program and the BMBF project MOM (Grant 01IW15002).

Notes

1.
http://s16a.org/node/13.
2.
http://www.eat.rl.ac.uk/.
3.
https://w3id.org/associations/vocab#.
4.
https://w3id.org/associations/eat.nt.gz.
5.
The manual mapping showed that about 12–28 % of the 5000 strong associations are mappable to DBpedia entities (depending on the amount of human labour and intelligence involved).
6.
http://www.mediawiki.org/wiki/API:Main_page.
7.
https://w3id.org/associations/mapping_eat_dbpedia.nt.gz.

References

Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Article Google Scholar
Bobić, T., Waitelonis, J., Sack, H.: FRanCo - a ground truth corpus for fact ranking evaluation. In: SumPre 2015 at ESWC 2015, pp. 1–12 (2015)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Gerrig, R.J., Zimbardo, P.G.: Psychology and Life, 19th edn. Allyn & Bacon, Pearson, Boston (2010)
Google Scholar
Hees, J., Khamis, M., Biedert, R., Abdennadher, S., Dengel, A.: Collecting links between entities ranked by human association strengths. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 517–531. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38288-8_35
Chapter Google Scholar
Hees, J., Roth-Berghofer, T., Biedert, R., Adrian, B., Dengel, A.: BetterRelations: using a game to rate linked data triples. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS (LNAI), vol. 7006, pp. 134–138. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24455-1_12
Chapter Google Scholar
Hees, J., Roth-Berghofer, T., Dengel, A.: Linked data games: simulating human association with linked data. In: LWA 2010, Kassel, Germany, pp. 255–260 (2010)
Google Scholar
Kiss, G.R., Armstrong, C., Milroy, R., Piper, J.: An associative thesaurus of English and its computer analysis. In: The Computer and Literary Studies, pp. 153–165. Edinburgh University Press, Edinburgh (1973)
Google Scholar
Kny, E., Kölle, S., Töpper, G., Wittmers, E.: WhoKnows? (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Kaiserslautern, Kaiserslautern, Germany
Jörn Hees, Rouven Bauer, Joachim Folz, Damian Borth & Andreas Dengel
Knowledge Management Department, DFKI GmbH, Kaiserslautern, Germany
Jörn Hees, Rouven Bauer, Joachim Folz, Damian Borth & Andreas Dengel

Authors

Jörn Hees
View author publications
You can also search for this author in PubMed Google Scholar
Rouven Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Folz
View author publications
You can also search for this author in PubMed Google Scholar
Damian Borth
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Dengel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jörn Hees .

Editor information

Editors and Affiliations

Hasso-Plattner-Institut für Softwaresystemtechnik, Universität Potsdam, Potsdam, Germany
Harald Sack
Innovation Development, Istituto Superiore Mario Boella, Turin, Italy
Giuseppe Rizzo
Technical University of Ilmenau, Ilemnau, Germany
Nadine Steinmetz
Artiﬁcial Intelligence Laboratory, J. Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Institut für Informatik III, University of Bonn, Bonn, Germany
Sören Auer
Institut für Informatik III, Universität Bonn, Bonn, Germany
Christoph Lange

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hees, J., Bauer, R., Folz, J., Borth, D., Dengel, A. (2016). Edinburgh Associative Thesaurus as RDF and DBpedia Mapping. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds) The Semantic Web. ESWC 2016. Lecture Notes in Computer Science(), vol 9989. Springer, Cham. https://doi.org/10.1007/978-3-319-47602-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-47602-5_4
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47601-8
Online ISBN: 978-3-319-47602-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Edinburgh Associative Thesaurus as RDF and DBpedia Mapping

Abstract

Similar content being viewed by others

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information

Towards Enriching DBpedia from Vertical Enumerative Structures Using a Distant Learning Approach

1 Introduction

2 Related Work

3 Edinburgh Associative Thesaurus as RDF

4 Mapping EAT to DBpedia

5 Conclusion and Outlook

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Edinburgh Associative Thesaurus as RDF and DBpedia Mapping

Abstract

Similar content being viewed by others

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information

Towards Enriching DBpedia from Vertical Enumerative Structures Using a Distant Learning Approach

1 Introduction

2 Related Work

3 Edinburgh Associative Thesaurus as RDF

4 Mapping EAT to DBpedia

5 Conclusion and Outlook

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation