Skip to main content

Recycling Annotated Parallel Corpora for Bilingual Document Composition

  • Conference paper
  • First Online:
Envisioning Machine Translation in the Information Future (AMTA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1934))

Included in the following conference series:

Abstract

Parallel corpora enriched with descriptive annotations facilitate multilingual authoring development. Departing from an annotated bitext we show how SGML markup can be recycled to produce complementary language resources. On the one hand, several translation memory databases together with glossaries of proper nouns have been produced. On the other, DTDs for source and target documents have been derived and put into correspondence. This paper discusses how these resources have been automatically generated and applied to an interactive bilingual authoring system. This tool is capable of handling a substantial proportion of text both in the composition and translation of structured documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Adolphson Writing instruction and controlled language applications: panel discussion on standarization. Proceedings of GLAW’98, 191, 1998.

    Google Scholar 

  2. H. Ahonen. Automatic Generation of SGML Content Models. Electronic Publishing, 8(2-3):195–206, 1995.

    Google Scholar 

  3. J. Allen. Adapting the Concept of Translation Memory to Authoring Memory for a Controlled Language Writing Enviroment. ASLIB-TG21, 1999.

    Google Scholar 

  4. R. D. Brown. Adding Linguistic Knowledge to a Lexical Example-Based Translation System. Proceedings of the Eighth International Conference on Theoretical and Methodological Issues in Machine Translation, 22–32, 1999.

    Google Scholar 

  5. A. Casillas, J. Abaitua, R. Martinez. Extraction y aprovechamiento de DTDs emparejadas en corpus paralelos. Procesamiento del Lenguaje Natural, 25:33–41, 1999.

    Google Scholar 

  6. ISO 8879, Information Processing-Text and Office Systems-Standard Generalized Markup Language (SGML). International Organization For Standards, 1986, Geneva.

    Google Scholar 

  7. J. Langé, é Gaussier, B. Daile. Bricks and Skeletons: Some Ideas for the Near Future of MATH. Machine Translation, 12:39–51, 1997.

    Article  Google Scholar 

  8. R. Martínez, J. Abaitua, A. Casillas. Bilingual parallel text segmentation and tagging for specialized documentation. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP’97), 369–372, 1997.

    Google Scholar 

  9. R. Martínez, J. Abaitua, A. Casillas. Bitext Correspondences through Rich Mark-up. 36th Annual Meeting of the Association for Computational Linguistics abd 11 International Conference on Computational Linguistics (COLING-ACL’98), 812–818, 1998.

    Google Scholar 

  10. R. Martínez, J. Abaitua, A. Casillas. Aligning tagged bitexts. Sixth Workshop on Very Large Corpora, 102–109, 1998.

    Google Scholar 

  11. C. Sperberg-McQueen, L. Burnard. Guidelines for the Encoding and Interchange (P3). Text Encoding Initiative, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Casillas, A., Abaitua, J., Martinez, R. (2000). Recycling Annotated Parallel Corpora for Bilingual Document Composition. In: White, J.S. (eds) Envisioning Machine Translation in the Information Future. AMTA 2000. Lecture Notes in Computer Science(), vol 1934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39965-8_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-39965-8_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41117-8

  • Online ISBN: 978-3-540-39965-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics