A Neural-Machine-Translation System Resilient to Out of Vocabulary Words for Translating Natural Language to SPARQL

Borroto, Manuel; Ricca, Francesco; Cuteri, Bernardo

doi:10.1007/978-3-031-08421-8_12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13196))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

831 Accesses

Abstract

The development and diffusion of ontologies allowed the creation of large banks of information regarding multiple domains known as knowledge bases. Ontologies propose a way to represent information providing semantic meaning that allows the data to be machine-interpretable. However, enjoying such rich knowledge is a difficult task for the majority of potential users who do not know either the knowledge-base definition or how to write queries with SPARQL. Systems able to translate natural language questions into SPARQL queries have the potential to overcome this problem. In this paper, we propose an approach that combines the Named Entity Recognition and Neural Machine Translation tasks to perform an automatic translation of natural language questions into executables SPARQL queries. The resulting approach provides robustness to the presence of terms that do not occur in the training set. We evaluate the potential of our approach by using Monument and QALD-9, which are well-known datasets for Question Answering over the DBpedia ontology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that we are interested in computing the answers, and not in reproducing syntactically the gold query.
2.
https://github.com/LiberAI/NSpM/tree/master/data.
3.
https://github.com/ag-sc/QALD/tree/master/9/data.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. TACL 5, 135–146 (2017)
Article Google Scholar
Chen, Y., Li, H., Hua, Y., Qi, G.: Formal query building with query structure prediction for complex question answering over knowledge base. In: IJCAI (2020)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014)
Francois, C.: Deep Learning with Python. Manning Publications Company (2017)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML. Proceedings of ML Research, vol. 70, pp. 1243–1252. PMLR (2017)
Google Scholar
Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing? Int. J. Hum.-Comput. Stud. 43(5–6), 907–928 (1995)
Article Google Scholar
Hartmann, A., Marx, E., Soru, T.: Generating a large dataset for neural question answering over the DBpedia knowledge base (2018)
Google Scholar
Hochreiter, S.: Recurrent neural net learning and vanishing gradient. Int. J. Uncert. Fuzz. KB Syst. 6(2), 107–116 (1998)
Article MathSciNet Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)
Google Scholar
Kapanipathi, et al.: Question answering over knowledge bases by leveraging semantic parsing and neuro-symbolic reasoning. arXiv preprint arXiv:2012.01707 (2020)
Klinger, R., Tomanek, K.: Classical probabilistic models and conditional random fields. Citeseer (2007)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Luong, M., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Luz, F.F., Finger, M.: Semantic parsing natural language into SPARQL: improving target language representation with neural attention. CoRR abs/1803.04329 (2018)
Google Scholar
Ngomo, N.: 9th challenge on question answering over linked data (QALD-9). Language 7(1) (2018)
Google Scholar
Panchbhai, A., Soru, T., Marx, E.: Exploring sequence-to-sequence models for SPARQL pattern composition. In: Villazón-Terrazas, B., Ortiz-Rodríguez, F., Tiwari, S.M., Shandilya, S.K. (eds.) KGSWC 2020. CCIS, vol. 1232, pp. 158–165. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65384-2_12
Chapter Google Scholar
Pradel, C., Haemmerlé, O., Hernandez, N.: Natural language query interpretation into SPARQL using patterns (2013)
Google Scholar
Soru, T., et al.: SPARQL as a foreign language. SEMANTiCS 2017 - Posters and Demos (2017). https://arxiv.org/abs/1708.07624
Steinmetz, N., Arning, A., Sattler, K.: From natural language questions to SPARQL queries: a pattern-based approach. In: BTW. LNI, vol. P-289, pp. 289–308. Gesellschaft für Informatik, Bonn (2019)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)
Google Scholar
W3C: Semantic web standards (2014). https://www.w3.org
Yin, X., Gromann, D., Rudolph, S.: Neural machine translating from natural language to SPARQL. CoRR abs/1906.09302 (2019)
Google Scholar
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. arXiv preprint arXiv:1809.08887 (2018)
Zhang, R., et al.: Editing-based SQL query generation for cross-domain context-dependent questions. arXiv preprint arXiv:1909.00786 (2019)
Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Calabria, 87036, Rende, CS, Italy
Manuel Borroto, Francesco Ricca & Bernardo Cuteri

Authors

Manuel Borroto
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Ricca
View author publications
You can also search for this author in PubMed Google Scholar
Bernardo Cuteri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Borroto .

Editor information

Editors and Affiliations

Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Stefania Bandini
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Francesca Gasparini
Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genova, Italy
Viviana Mascardi
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Matteo Palmonari
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Giuseppe Vizzari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Borroto, M., Ricca, F., Cuteri, B. (2022). A Neural-Machine-Translation System Resilient to Out of Vocabulary Words for Translating Natural Language to SPARQL. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. Lecture Notes in Computer Science(), vol 13196. Springer, Cham. https://doi.org/10.1007/978-3-031-08421-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-08421-8_12
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08420-1
Online ISBN: 978-3-031-08421-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Neural-Machine-Translation System Resilient to Out of Vocabulary Words for Translating Natural Language to SPARQL