An Approach for Extracting Bilingual Terminology from Wikipedia

Erdmann, Maike; Nakayama, Kotaro; Hara, Takahiro; Nishio, Shojiro

doi:10.1007/978-3-540-78568-2_28

Maike Erdmann¹,
Kotaro Nakayama¹,
Takahiro Hara¹ &
…
Shojiro Nishio¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4947))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1089 Accesses
17 Citations

Abstract

With the demand of bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach to extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast amount of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translation candidates from redirect page and link text information. In an experiment, we proved the advantages of our methods compared to a traditional approach of extracting bilingual terminology from parallel corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Shimohata, S.: Finding translation candidates from patent corpus. In: Proceedings of the Machine Translation Summit, September 12-16, 2005, pp. 50–54 (2005)
Google Scholar
Sadat, F., Yoshikawa, M., et al.: Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval. In: The Companion Volume to the Proceedings of Annual Meeting of the Association for Computational Linguistics, July 2003, pp. 141–144 (2003)
Google Scholar
Nakayama, K., Hara, T., Nishio, S.: A thesaurus construction method from large scale web dictionaries. In: IEEE International Conference on Advanced Information Networking and Applications (AINA 2007), pp. 932–939 (2007)
Google Scholar
Nakayama, K., Hara, T., Nishio, S.: Wikipedia mining for an association web thesaurus construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, Springer, Heidelberg (2007)
Chapter Google Scholar
Breen, J.W.: Jmdict: a japanese-multilingual dictionary. In: COLING Multilingual Linguistic Resources Workshop (August 2004)
Google Scholar
Tsuji, K., Kageura, K.: Automatic generation of japanese-english bilingual thesauri based on bilingual corpora. Journal of the American Society for Information Science and Technology 57(7), 891–906 (2006)
Article Google Scholar
Fung, P., McKeown, K.: A technical word- and term-translation aid using noisy parallel corpora across language groups. Machine Translation 12(1-2), 53–87 (1997)
Article Google Scholar
Kaji, H.: Adapted seed lexicon and combined bidirectional similarity measures for translation equivalent extraction from comparable corpora. In: Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation, October 4-6, 2004, pp. 115–124 (2004)
Google Scholar
Wikimedia Foundation: Wikimedia downloads, http://download.wikimedia.org/
Utiyama, M., Isahara, H.: Reliable measures for aligning japanese-english news articles and sentences. In: Proceedings of the Annual Meeting of Association for Computational Linguistics, pp. 72–79 (2003)
Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. In: Proceedings of the International Conference on Computational Linguistics, vol. 19(2), pp. 263–311 (1993)
Google Scholar
Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the Conference on Computational Linguistics, pp. 836–841 (1996)
Google Scholar
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, October 2000, pp. 440–447 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Multimedia Engineering, Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka, 565-0871, Japan
Maike Erdmann, Kotaro Nakayama, Takahiro Hara & Shojiro Nishio

Authors

Maike Erdmann
View author publications
You can also search for this author in PubMed Google Scholar
Kotaro Nakayama
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Hara
View author publications
You can also search for this author in PubMed Google Scholar
Shojiro Nishio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jayant R. Haritsa Ramamohanarao Kotagiri Vikram Pudi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Erdmann, M., Nakayama, K., Hara, T., Nishio, S. (2008). An Approach for Extracting Bilingual Terminology from Wikipedia. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-78568-2_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78567-5
Online ISBN: 978-3-540-78568-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics