Abstract
Reordering is of essential importance problem for phrase based statistical machine translation (SMT). In this paper, we propose an approach to automatically learn reordering rules as preprocessing step based on a dependency parser in phrase-based statistical machine translation for English to Vietnamese. We used dependency parsing and rules extracting from training the features-rich discriminative classifiers for reordering source-side sentences. We evaluated our approach on English-Vietnamese machine translation tasks, and showed that it outperform the baseline phrase-based SMT system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, pp. 127–133 (2003)
Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Comput. Linguist. 30(4), 417–449 (2004)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, pp. 263–270, June 2005
Zhang, Y., Zens, R., Ney, H.: Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of SSST, NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation, pp. 1–8 (2007)
Collins, M., Koehn, P., Kucerová, I.: Clause restructuring for statistical machine translation. In: Proceedings of ACL 2005, Ann Arbor, USA, pp. 531–540 (2005)
Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL 2005, Ann Arbor, Michigan, USA, pp. 271–279 (2005)
Xia, F., McCord, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of Coling 2004, Geneva, Switzerland, COLING, 23–27 August 2004, pp. 508–514 (2004)
Xu, P., Kang, J., Ringgaard, M., Och, F.: Using a dependency parser to improve SMT for subject-object-verb languages. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, pp. 245–253. Association for Computational Linguistics, June 2009
Genzel, D.: Automatically learning source-side reordering rules for large scale machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics. COLING 2010, Stroudsburg, PA, USA, pp. 376–384. Association for Computational Linguistics (2010)
Lerner, U., Petrov, S.: Source-side classifier preordering for machine translation. In: EMNLP, pp. 513–523 (2013)
Li, C.H., Li, M., Zhang, D., Li, M., Zhou, M., Guan, Y.: A probabilistic approach to syntax-based reordering for statistical machine translation. In: Annual Meeting-association for Computational Linguistics, vol. 45, p. 720 (2007)
Yang, N., Li, M., Zhang, D., Yu, N.: A ranking-based approach to word reordering for statistical machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 912–920. Association for Computational Linguistics (2012)
Jehl, L., de Gispert, A., Hopkins, M., Byrne, B.: Source-side preordering for translation using logistic regression and depth-first branch-and-bound search. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 239–248. Association for Computational Linguistics, April 2014
Habash, N.: Syntactic preprocessing for statistical machine translation. In: Proceedings of the 11th MT Summit (2007)
Cai, J., Utiyama, M., Sumita, E., Zhang, Y.: Dependency-based pre-ordering for Chinese-English machine translation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (2014)
Hoshino, S., Miyao, Y., Sudoh, K., Hayashi, K., Nagata, M.: Discriminative preordering meets kendall’s \(\uptau \) maximization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China, pp. 139–144. Association for Computational Linguistics, July 2015
Nakagawa, T.: Efficient top-down BTG parsing for machine translation preordering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 208–218. Association for Computational Linguistics, July 2015
Wang, L.: Support Vector Machines: Theory and Applications, vol. 117. Springer Science & Business Media, Heidelberg (2005)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Cer, D., de Marneffe, M.C., Jurafsky, D., Manning, C.D.: Parsing to stanford dependencies: trade-offs between speed and accuracy. In: 7th International Conference on Language Resources and Evaluation (LREC 2010) (2010)
Tran, V.H., Nguyen, V.V., Nguyen, M.L.: Improving English-Vietnamese statistical machine translation using preprocessing dependency syntactic. In: Proceedings of the 2015 Conference of the Pacific Association for Computational Linguistics (Pacling 2015), pp. 115–121 (2015)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Demonstration Session (2007)
Nguyen, T.P., Shimazu, A., Ho, T.B., Nguyen, M.L., Nguyen, V.V.: A tree-to-string phrase-based model for statistical machine translation. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL 2008), Manchester, England. Coling 2008 Organizing Committee, pp. 143–150, August 2008
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of International Conference on Spoken Language Processing, vol. 29, pp. 901–904 (2002)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Acknowledgements
This work described in this paper has been partially funded by Hanoi National University (QG.15.23 project).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Tran, V.H., Vu, H.T., Nguyen, V.V., Nguyen, M.L. (2018). A Classifier-Based Preordering Approach for English-Vietnamese Statistical Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-75487-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)