Taking Advantage of Discursive Properties for Validating Hierarchical Semantic Relations from Parallel Enumerative Structures

Kamel, Mouna; Trojahn, Cassia

doi:10.1007/978-3-319-47602-5_16

Mouna Kamel¹⁹ &
Cassia Trojahn¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9989))

Included in the following conference series:

European Semantic Web Conference

1376 Accesses

Abstract

This paper presents an approach for automatically validating candidate hierarchical relations extracted from parallel enumerative structures. It relies on the discursive properties of these structures and on the combination of resources of different nature, a semantic network and a distributional resource. The results show an accuracy of between 0.50 and 0.67, with a gain of 0.11 when combining the two resources.

You have full access to this open access chapter, Download conference paper PDF

New Test Patterns to Check the Hierarchical Structure of Wordnets

On Extracting Relations Using Distributional Semantics and a Tree Generalization

Towards a Wide-Coverage Tableau Method for Natural Logic

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Relation extraction is a key task in ontology learning from texts. The identification of candidate relations has been the subject of large body of literature and many approaches have been proposed (linguistic, statistical or hybrid approaches, based or not on learning methods). However, this is an error-prone step (imprecise lexico-syntactic patterns, accuracy of learning techniques under 100 %, chaining of NLP tools in pre-processing steps, etc.). Validating candidate relations is a crucial step before integrating them into semantic resources.

This paper concerns the validation of candidate hierarchical relations, the backbone of ontologies. While manual validation is a time-consuming task requiring domain expert judges, automatic ones rely on external semantic resources (such as WordNet, BabelNet), which are usually non domain-specific, or gold standards, which may suffer of imperfections or low domain coverage. The proposal here relies on the extraction of hierarchical semantic relations from parallel enumerative structures (called hereafter PES) [4]. This choice is motivated by the following reasons: (1) PES often carry hierarchical relations; (2) they are frequent in corpora, especially in scientific or encyclopedic texts (rich sources of semantic relations); and (3) they have well-established discursive properties bringing up a semantic unit within the structure. The originality of our approach lies in the discourse properties of PES for disambiguating candidate relations and in the combination of two complementary external resources, a semantic network and a distributional resource. While the semantic network allows for validating the candidate relations with a good level of precision, the distributional resource, which does not specify the nature of the relation but offers a good coverage, allows for emerging new relations, which may enrich the network itself. Although evaluated for the French language, the approach remains reproducible for any other language.

2 Parallel Enumerative Structures

An enumerative structure is a textual structure expressing hierarchical knowledge through different components: a primer, a list of items (at least two) constituting the enumeration, and possibly a conclusion. Different typologies have been proposed [3, 5]. Here, we consider enumeratives structures for which the enumeration items are functionally equivalent (from a syntactic and rhetoric point of view) (Fig. 1). From a discursive point of view, the items are independent in a given context: they are in turn connected by a multi-nuclear rhetoric relation (or coordination), the first item being linked to the primer by a nuclear-satellite relation (or subordination) (Fig. 2). According to the RST (Rhetorical Structure Theory) [2], if “DU\(_{j}\) (where DU corresponds to Discourse Unit) is subordinated to DU\(_i\), hence each DU\(_k\) coordinated with DU\(_j\) is subordinate to DU\(_i\)”. Thereby, N nuclear-satellite relations between DU\(_0\) and DU\(_i\), for i=1,...,N (if N is the number of items in the ES) can be inferred. These N relations can be specialised in N semantic relations R(H, h\(_i\))\(_{i=1,...,N}\) of same nature, where H correspond to a term of DU\(_0\), and h\(_i\) to a term of DU\(_i\). From Fig. 2, three relations can be identified: R(disease, Cholera), R(disease, Colorectal cancer), and R(disease, Diverticulitis).

3 Proposed Approach

The validation principle exploits the discourse properties of PES to jointly validate the relations \(R(H,h_i)\) (\(\mathrm{i}=1,..,\mathrm{N}\)) where R is the hypernym relation:

1.
if \(R(H, h_i)\) corresponds to an entry in the semantic network SN, \(R(H, h_i)\) is validated.
2.
if \(R(H, h_i)\) has no entry in SN, but an entry corresponding to \(R(H, h_j)\) exists in SN and \(h_i\) is a neighbour of \(h_j\) in the distributional resource DR, then \(R(H, h_i)\) is validated.

From SN, we retrieve Synsets(H), the synsets of H, and \(SuperHyperyms_{SN}^k(h_{i})\), the hypernym synsets of \(h_{i}\) of rank k (k being the maximum length of the path from \(h_{i}\) to one of its hypernym synsets in SN, based on a depth-first search strategy). From DR we retrieve \(p(h_i,h_j)\), the semantic proximity between \(h_i\) and \(h_j\). This process is described in Algorithm 1.

4 Experimentation

Data set and resources. The evaluation data set^{Footnote 1} is composed of 67 PES involving 262 candidate relations, automatically extracted from Wikipedia pages [4]. These relations have been manually validated by two annotators in a double-blind process. 27 conflicts were identified and resolved. 206 relations were assessed as correct and 56 as incorrect. This set constitutes our gold standard. With respect to the resources, we have used the multilingual semantic network BabelNet [6] and the distributional resource Voisins de Wikipédia [1]. They have been chosen because they support French language and they are built from the same corpus as the one used for constructing the evaluation data set.

Results and discussion. Two sets of candidate relations were considered (Table 1): S, the whole set of true positive relations from the gold standard (206 relations) and \(S_{BN}\), the subset of S for which H exists in BabelNet (116 relations). For both sets, 76 out of 78 relations were correctly validated by the system. 12 out of 76 have been correctly validated thanks to the distributional resource, what corresponds to an improvement of the performance up to 11 %. In terms of recall, we have a lower performance (76 relations out of 206 for S but 76 out of 116 for \(S_{BN}\)). In terms of accuracy, 130 relations have been validated (out of 262) for the set S and 88 relations (out of 131) for the set \(S_{BN}\).

Table 1. Overall results of the validation process combining both SN and DR. (+) corresponds to the specific gain of using DR.

Full size table

Although the precision is quite high, we could identify the reasons for the noisy cases. It is due to the fact that we are using BabelSynsets which group terms of similar meaning. For instance, for the candidate relation R(country,Horn of Africa), the BabelSynset bn:00028934n = {land, dry land, earth, ground, terra firma} belongs to the intersection of the sets \(SuperHyperyms_{BN}^3\)(Horn of Africa) and Synsets(country). With respect to the low recall, we observed two main phenomena. First, 62 hypernyms (from S) have no entries in BabelNet. In this case, no relation within the PES could be validated. Second, considering \(k=3\) (empirically chosen) as maximum length of the path from \(h_{i}\) to one of its hypernyms seems to be insufficient. We could also observe that the distributional resource allows for identifying missing entries in the semantic network. For example, the relation R(chromosomal abnormality,insertion) was validated due to the fact that insertion and deletion are semantically near in the distributional resource. Although the entries in this resource overwhelmingly correspond to single words and 40 % of our hyponyms correspond to compounds, we improved the performance up to 11 % when combining both resources. Distributional resources supporting compounds may further improve our results.

5 Conclusions and Future Work

This paper proposed an approach for automatically validating semantic relations, relying on discursive properties and combining a semantic network and a distributional resource. As future work, we plan to exploit alternative resources (in particular, distributional resources with compounds), analyse the trade-off between depth-first and breath-first search strategies and their computational complexity, exploiting larger semantic networks or combining several resources together. We intent as well to extend our approach to validate other semantic relations like meronymy, synonymy and antonym.

Notes

1.
Available at https://www.irit.fr/~Cassia.Trojahn/PES.zip.

References

Adam, C., Fabre, C., Muller, P.: Évaluer et améliorer une ressource distributionnelle : protocole d’annotation de liens sémantiques. TAL 54(1), 71–97 (2013)
Google Scholar
Asher, N.: Reference to abstract objects in discourse: a philosophical semantics for natural language metaphysics. In: SLAP, vol. 50. Kluwer (1993)
Google Scholar
Christophe, L.: Représentation et composition des structures visuelles et rhétoriques du textes. Approche pour la génération de textes formatés. PhD thesis (2000)
Google Scholar
Fauconnier, J.P., Kamel, M.: Discovering hypernymy relations using text layout. In: Joint Conference on Lexical and Computational Semantics, Denver, pp. 249–258. ACL (2015)
Google Scholar
Hovy, E., Arens, Y.: Readings in intelligent user interfaces. In: Automatic Generation of Formatted Text, pp. 256–262. Morgan Kaufmann Publishers (1998)
Google Scholar
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgement

Cassia Trojahn is partially supported by the French FUI SparkinData project.

Author information

Authors and Affiliations

Institut de Recherche en Informatique de Toulouse, Toulouse, France
Mouna Kamel & Cassia Trojahn

Authors

Mouna Kamel
View author publications
You can also search for this author in PubMed Google Scholar
Cassia Trojahn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mouna Kamel .

Editor information

Editors and Affiliations

Hasso-Plattner-Institut für Softwaresystemtechnik, Universität Potsdam, Potsdam, Germany
Harald Sack
Innovation Development, Istituto Superiore Mario Boella, Turin, Italy
Giuseppe Rizzo
Technical University of Ilmenau, Ilemnau, Germany
Nadine Steinmetz
Artiﬁcial Intelligence Laboratory, J. Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Institut für Informatik III, University of Bonn, Bonn, Germany
Sören Auer
Institut für Informatik III, Universität Bonn, Bonn, Germany
Christoph Lange

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kamel, M., Trojahn, C. (2016). Taking Advantage of Discursive Properties for Validating Hierarchical Semantic Relations from Parallel Enumerative Structures. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds) The Semantic Web. ESWC 2016. Lecture Notes in Computer Science(), vol 9989. Springer, Cham. https://doi.org/10.1007/978-3-319-47602-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-47602-5_16
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47601-8
Online ISBN: 978-3-319-47602-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Taking Advantage of Discursive Properties for Validating Hierarchical Semantic Relations from Parallel Enumerative Structures

Abstract

Similar content being viewed by others

New Test Patterns to Check the Hierarchical Structure of Wordnets

On Extracting Relations Using Distributional Semantics and a Tree Generalization

Towards a Wide-Coverage Tableau Method for Natural Logic

Keywords

1 Introduction

2 Parallel Enumerative Structures

3 Proposed Approach

4 Experimentation

5 Conclusions and Future Work

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Taking Advantage of Discursive Properties for Validating Hierarchical Semantic Relations from Parallel Enumerative Structures

Abstract

Similar content being viewed by others

New Test Patterns to Check the Hierarchical Structure of Wordnets

On Extracting Relations Using Distributional Semantics and a Tree Generalization

Towards a Wide-Coverage Tableau Method for Natural Logic

Keywords

1 Introduction

2 Parallel Enumerative Structures

3 Proposed Approach

4 Experimentation

5 Conclusions and Future Work

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation