Abstract
We address the problem of simplifying Portuguese texts at the sentence level by treating it as a “translation task”. We use the Statistical Machine Translation (SMT) framework to learn how to translate from complex to simplified sentences. Given a parallel corpus of original and simplified texts, aligned at the sentence level, we train a standard SMT system and evaluate the “translations” produced using both standard SMT metrics like BLEU and manual inspection. Results are promising according to both evaluations, showing that while the model is usually overcautious in producing simplifications, the overall quality of the sentences is not degraded and certain types of simplification operations, mainly lexical, are appropriately captured.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ribeiro, V.M.: Analfabetismo e alfabetismo funcional no Brasil. In: Boletim INAF. Instituto Paulo Montenegro, São Paulo (2006)
Max, A.: Writing for Language-impaired Readers. In: Proceedings of 7th Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, pp. 567–570 (2006)
Petersen, S.E.: Natural Language Processing Tools for Reading Level Assessment and Text Simplification for Bilingual Education. PhD thesis, University of Washington (2007)
Siddharthan, A.: Syntactic Simplification and Text Cohesion. PhD thesis, University of Cambridge (2003)
Devlin, S., Unthank, G.: Helping aphasic people process online information. In: Proceedings of the ACM Conference on Computers and Accessibility, Portland, Oregon, pp. 225–226 (2006)
Klebanov, B., Knight, K., Marcu, D.: Text Simplification for Information-Seeking Applications. In: Meersman, R., Tari, Z. (eds.) OTM 2004. LNCS, vol. 3290, pp. 735–747. Springer, Heidelberg (2004)
Vickrey, D., Koller, D.: Sentence Simplification for Semantic Role Labeling. In: Proceedings of the ACL-HLT, pp. 344–352 (2008)
Chandrasekar, R., Srinivas, B.: Automatic Induction of Rules for Text Simplification. Knowledge-Based Systems 10, 183–190 (1997)
Daelemans, W., Hothker, A., Sang, E.T.K.: Automatic Sentence Simplification for Subtitling in Dutch and English. In: Proceedings of the 4th Conference on Language Resources and Evaluation, Lisbon, Portugal, pp. 1045–1048 (2004)
Petersen, S.E., Ostendorf, M.: Text Simplification for Language Learners: A Corpus Analysis. In: Proceedings of the Speech and Language Technology for Education Workshop, Pennsylvania, USA, pp. 69–72 (2007)
Candido Jr., A., Maziero, E., Gasperin, C., Pardo, T.A.S., Specia, L., Aluisio, S.M.: Supporting the Adaptation of Texts for Poor Literacy Readers: a Text Simplification Editor for Brazilian Portuguese. In: Proceedings of the NAACL/HLT Workshop on Innovative Use of NLP for Building Educational Applications, Boulder, Colorado, pp. 34–42 (2009)
Gasperin, C., Specia, L., Pereira, T., Aluisio, S.M.: Learning When to Simplify Sentences for Natural Text Simplification. In: Proceedings of the Encontro Nacional de Inteligência Artificial (ENIA), Bento Gonçalves, Brazil, pp. 809–818 (2009)
Simard, W., Goutte, C., Isabelle, P.: Statistical Phrase-based Post-editing. In: Proceedings of NAACL HLT, Rochester, USA, pp. 508–515 (2007)
Caseli, H.M., Pereira, T.F., Specia, L., Pardo, T.A.S., Gasperin, C., Aluísio, S.M.: Building a Brazilian Portuguese parallel corpus of original and simplified texts. In: 10th Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, pp. 59–70 (2009)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, C., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of the 45th ACL, demonstration session, Prague, Czech Republic (2007)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st ACL, Sapporo, Japan, pp. 160–167 (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th ACL, Morristown, pp. 311–318 (2002)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the 2nd Conference on Human Language Technology Research, San Diego, pp. 138–145 (2002)
Callison-Burch, C., Koehn, P., Monz, C., Schroeder, J.: Findings of the 2009 Workshop on Statistical Machine Translation. In: Proceedings of the 4th Workshop on Statistical Machine Translation, Athens, Greece, pp. 1–28 (2009)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd ACL, Ann Arbor, USA, pp. 263–270 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Specia, L. (2010). Translating from Complex to Simplified Sentences. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds) Computational Processing of the Portuguese Language. PROPOR 2010. Lecture Notes in Computer Science(), vol 6001. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12320-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-12320-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12319-1
Online ISBN: 978-3-642-12320-7
eBook Packages: Computer ScienceComputer Science (R0)