Controlled Knowledge Base Enrichment from Web Documents

Mrabet, Yassine; Bennacer, Nacéra; Pernelle, Nathalie

doi:10.1007/978-3-642-35063-4_23

Yassine Mrabet²⁰,
Nacéra Bennacer²¹ &
Nathalie Pernelle²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7651))

Included in the following conference series:

International Conference on Web Information Systems Engineering

2503 Accesses
1 Citations

Abstract

The Linked Open Data initiative brought more and more RDF data sources to be published on the Web. However, these data sources contain relatively little information compared to the documents available on the surface Web. Many annotation tools have been proposed in the last decade for the automatic construction and enrichment of knowledge bases. But, while noticeable advances are achieved for the extraction of concept instances, the extraction of semantic relations remains a challenging task when the structures and the vocabularies of the target documents are heterogeneous. In this paper, we propose a novel approach, called REISA, which allows to enrich RDF/OWL knowledge bases with semantic relations using semistructured documents annotated with concept instances. REISA produces weighted relation instances without exploiting lexico-syntactic or structure regularities in the documents. Neighbor domain entities in the annotated documents are used to generate the first sets of candidate relations according to the domain and range axioms defined in a domain ontology. The construction of these candidate sets relies on automated semantic controls performed with (i) the existing knowledge bases and (ii) the (inverse) functionality of the target relations. The weighting of the selected relation candidates is performed according to the neighborhood distance between the annotated domain entities in the document. Experiments on two real web datasets show that (i) REISA allows to extract semantic relationships with interesting precision values reaching 76,5% and that (ii) the weighting method is effective for ranking the relation candidates according to their precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aussenac-Gilles, N., Jacques, M.-P.: Designing and Evaluating Patterns for Ontology Enrichment from Texts. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS (LNAI), vol. 4248, pp. 158–165. Springer, Heidelberg (2006)
Chapter Google Scholar
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, Morristown, NJ, USA, pp. 194–201. Association for Computational Linguistics (March 1997)
Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia – a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web 7, 154–165 (2009)
Article Google Scholar
Buitelaar, P., Siegel, M.: Ontology-based information extraction with soba. In: Proc. of the International Conference on Language Resources and Evaluation (LREC), pp. 2321–2324 (2006)
Google Scholar
Cimiano, P., Ladwig, G., Staab, S.: Gimme’the context: Context driven automatic semantic annotation with c-pankow. In: WWW Conference (2005)
Google Scholar
Gerber, D., Ngonga Ngomo, A.-C.: Bootstrapping the linked data web. In: 1st Workshop on Web Scale Knowledge Extraction, International Semantic Web Conference (1), vol. 7031. LNCS. Springer (2011)
Google Scholar
Hignette, G., Buche, P., Dibie-Barthélemy, J., Haemmerlé, O.: Fuzzy Annotation of Web Data Tables Driven by a Domain Ontology. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 638–653. Springer, Heidelberg (2009)
Chapter Google Scholar
Jain, P., Hitzler, P., Verma, K., Yeh, P.Z., Sheth, A.P.: Moving beyond sameas with plato: partonomy detection for linked data. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT 2012, pp. 33–42. ACM, New York (2012)
Google Scholar
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endow. 3, 1338–1347 (2010)
MATH Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes, 3–26 (2007)
Google Scholar
Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: Kim - semantic annotation platform. Journal of Natural Language Engineering 10(3), 375–392 (2004)
Article Google Scholar
Suchanek, F.M., Ifrim, G., Weikum, G.: Combining linguistic and statistical analysis to extract relations from web documents. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, New York, USA, pp. 712–717 (August 2006)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A large ontology from wikipedia and wordnet. J. Web Sem. 6(3), 203–217 (2008)
Article Google Scholar
Suchanek, F.M., Sozio, M., Weikum, G.: Sofie: A self-organizing framework for information extraction. In: WWW Conference (2009)
Google Scholar
Thiam, M., Bennacer, N., Pernelle, N., Lô, M.: Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 611–618. Springer, Heidelberg (2009)
Chapter Google Scholar
Zadeh, L.A.: Fuzzy sets. Information and Control, 338–353 (1965)
Google Scholar

Download references

Author information

Authors and Affiliations

LRI, Université Paris-sud, PCRI, bât. 690, 91405, Orsay, France
Yassine Mrabet & Nathalie Pernelle
Supélec, E3S 3 rue Joliot Curie, 91192, GIF-SUR-YVETTE, France
Nacéra Bennacer

Authors

Yassine Mrabet
View author publications
You can also search for this author in PubMed Google Scholar
Nacéra Bennacer
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Pernelle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Fudan University, 825 Zhangheng Rd., Shanghai, 201203, China
X. Sean Wang
Department of Computer Science, College of Engineering, Science and Engineering Offices, The University of Illinois at Chicago, 851 South Morgan Street (M/C 152), 60607-7053, Chicago, Illinois, USA
Isabel Cruz
Department of Informatics and Telecommunications, University of Athens, GR15784, Ilisia, Athens, Greece
Alex Delis
Centre for Applied Informatics, Victoria University, PO Box 14428, 8001, Melbourne, VIC, Australia
Guangyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mrabet, Y., Bennacer, N., Pernelle, N. (2012). Controlled Knowledge Base Enrichment from Web Documents. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds) Web Information Systems Engineering - WISE 2012. WISE 2012. Lecture Notes in Computer Science, vol 7651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35063-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-35063-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35062-7
Online ISBN: 978-3-642-35063-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics