Abstract
Parallel corpora enriched with descriptive annotations facilitate multilingual authoring development. Departing from an annotated bitext we show how SGML markup can be recycled to produce complementary language resources. On the one hand, several translation memory databases together with glossaries of proper nouns have been produced. On the other, DTDs for source and target documents have been derived and put into correspondence. This paper discusses how these resources have been automatically generated and applied to an interactive bilingual authoring system. This tool is capable of handling a substantial proportion of text both in the composition and translation of structured documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Adolphson Writing instruction and controlled language applications: panel discussion on standarization. Proceedings of GLAW’98, 191, 1998.
H. Ahonen. Automatic Generation of SGML Content Models. Electronic Publishing, 8(2-3):195–206, 1995.
J. Allen. Adapting the Concept of Translation Memory to Authoring Memory for a Controlled Language Writing Enviroment. ASLIB-TG21, 1999.
R. D. Brown. Adding Linguistic Knowledge to a Lexical Example-Based Translation System. Proceedings of the Eighth International Conference on Theoretical and Methodological Issues in Machine Translation, 22–32, 1999.
A. Casillas, J. Abaitua, R. Martinez. Extraction y aprovechamiento de DTDs emparejadas en corpus paralelos. Procesamiento del Lenguaje Natural, 25:33–41, 1999.
ISO 8879, Information Processing-Text and Office Systems-Standard Generalized Markup Language (SGML). International Organization For Standards, 1986, Geneva.
J. Langé, é Gaussier, B. Daile. Bricks and Skeletons: Some Ideas for the Near Future of MATH. Machine Translation, 12:39–51, 1997.
R. Martínez, J. Abaitua, A. Casillas. Bilingual parallel text segmentation and tagging for specialized documentation. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP’97), 369–372, 1997.
R. Martínez, J. Abaitua, A. Casillas. Bitext Correspondences through Rich Mark-up. 36th Annual Meeting of the Association for Computational Linguistics abd 11 International Conference on Computational Linguistics (COLING-ACL’98), 812–818, 1998.
R. Martínez, J. Abaitua, A. Casillas. Aligning tagged bitexts. Sixth Workshop on Very Large Corpora, 102–109, 1998.
C. Sperberg-McQueen, L. Burnard. Guidelines for the Encoding and Interchange (P3). Text Encoding Initiative, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Casillas, A., Abaitua, J., Martinez, R. (2000). Recycling Annotated Parallel Corpora for Bilingual Document Composition. In: White, J.S. (eds) Envisioning Machine Translation in the Information Future. AMTA 2000. Lecture Notes in Computer Science(), vol 1934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39965-8_12
Download citation
DOI: https://doi.org/10.1007/3-540-39965-8_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41117-8
Online ISBN: 978-3-540-39965-0
eBook Packages: Springer Book Archive