Skip to main content

A Multi-view Approach for Term Translation Spotting

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Abstract

This paper presents a multi-view approach for term translation spotting, based on a bilingual lexicon and comparable corpora. We propose to study different levels of representation for a term: the context, the theme and the orthography. These three approaches are studied individually and combined in order to rank translation candidates. We focus our task on French-English medical terms. Experiments show a significant improvement of the classical context-based approach, with a F-score of 40.3% for the first ranked translation candidates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P., Della Pietra, S., Della Pietra, V., Jelinek, F., Lafferty, J., Mercer, R., Roossin, P.: A Statistical Approach to Machine Translation. Computational Linguistics 16, 79–85 (1990)

    Google Scholar 

  2. Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: MT Summit, vol. 5, Citeseer (2005)

    Google Scholar 

  3. Fung, P.: Compiling Bilingual Lexicon Entries from a Non-parallel English-Chinese Corpus. In: Proceedings of the 3rd Workshop on Very Large Corpora, pp. 173–183 (1995)

    Google Scholar 

  4. Rapp, R.: Identifying Word Translations in Non-parallel Texts. In: Proceedings of the 33rd ACL Conference, pp. 320–322. ACL (1995)

    Google Scholar 

  5. Chiao, Y., Zweigenbaum, P.: Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora. In: Proceedings of the 19th Coling Conference, vol. 2, pp. 1–5. ACL (2002)

    Google Scholar 

  6. Rubino, R.: Exploring Context Variation and Lexicon Coverage in Projection-based Approach for Term Translation. In: Proceedings of the RANLP Student Research Workshop, Borovets, Bulgaria, pp. 66–70. ACL (2009)

    Google Scholar 

  7. Laroche, A., Langlais, P.: Revisiting Context-based Projection Methods for Term-translation Spotting in Comparable Corpora. In: Proceedings of the 23rd Coling Conference, Beijing, China, pp. 617–625 (2010)

    Google Scholar 

  8. Shao, L., Ng, H.: Mining New Word Translations from Comparable Corpora. In: Proceedings of the 20th ACL Conference, p. 618. ACL (2004)

    Google Scholar 

  9. Gaussier, E., Renders, J., Matveeva, I., Goutte, C., Dejean, H.: A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora. In: Proceedings of the 42nd ACL Conference, p. 526. ACL (2004)

    Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady 10, 707–710 (1966)

    MATH  Google Scholar 

  12. Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proceedings of the 37th ACL Conference, pp. 519–526. ACL (1999)

    Google Scholar 

  13. Déjean, H., Gaussier, E., Renders, J., Sadat, F.: Automatic Processing of Multilingual Medical Terminology: Applications to Thesaurus Enrichment and Cross-language Information Retrieval. Artificial Intelligence in Medicine 33, 111–124 (2005)

    Article  Google Scholar 

  14. Koehn, P., Knight, K.: Learning a Translation Lexicon from Monolingual Corpora. In: Proceedings of the ACL Workshop on Unsupervised Lexical Acquisition, vol. 9, pp. 9–16. ACL (2002)

    Google Scholar 

  15. Church, K.W., Hanks, P.: Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  16. Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19, 61–74 (1993)

    Google Scholar 

  17. Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. Ph.D. Thesis, Institut für maschinelle Sprachverarbeitung, Universität Stuttgart (2004)

    Google Scholar 

  18. Fung, P., McKeown, K.: Finding Terminology Translations from Non-parallel Corpora. In: Proceedings of the 5th Workshop on Very Large Corpora, pp. 192–202 (1997)

    Google Scholar 

  19. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  20. Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd ACM SIGIR Conference, pp. 50–57. ACM, New York (1999)

    Google Scholar 

  21. Ni, X., Sun, J., Hu, J., Chen, Z.: Mining Multilingual Topics from Wikipedia. In: Proceedings of the 18th International Conference on WWW, pp. 1155–1156. ACM, New York (2009)

    Google Scholar 

  22. Boyd-Graber, J., Blei, D.M.: Multilingual topic models for unaligned text. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pp. 75–82 (2009)

    Google Scholar 

  23. Langlais, P., Yvon, F., Zweigenbaum, P.: Translating medical words by analogy. In: Intelligent Data Analysis in Biomedicine and Pharmacology, Washington, DC, USA, pp. 51–56 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rubino, R., Linarès, G. (2011). A Multi-view Approach for Term Translation Spotting. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19437-5_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19436-8

  • Online ISBN: 978-3-642-19437-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics