Dynamic Mask Curriculum Learning for Non-Autoregressive Neural Machine Translation

Wang, Yisong; Hou, Hongxu; Sun, Shuo; Wu, Nier; Jian, Weichen; Yang, Zongheng; Wang, Pengcong

doi:10.1007/978-981-19-7960-6_8

Yisong Wang⁷,
Hongxu Hou⁷,
Shuo Sun⁷,
Nier Wu⁷,
Weichen Jian⁷,
Zongheng Yang⁷ &
…
Pengcong Wang⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1671))

Included in the following conference series:

China Conference on Machine Translation

222 Accesses

Abstract

Non-autoregressive neural machine translation is gradually becoming a research hotspot due to its advantages of fast decoding. However, the increase of decoding speed is often accompanied by the loss of model performance. The main reason is that the target language information obtained at the decoder side is insufficient, and the mandatory parallel decoding leads to a large number of mistranslation and missing translation problems. In order to solve the problem of insufficient target language information, this paper proposes a dynamic mask curriculum learning approach to provide target side language information to the model. The target side self-attention layer is added in the pre-training phase to capture the target side information and adjust the amount of information input at any time by way of curriculum learning. The fine-tuning and inference phases disable the module in the same way as the normal NAT model. In this paper, we experiment on two translation datasets of WMT16, and the BLEU improvement reaches 4.4 without speed reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp. 1243–1252. PMLR (2017)
Google Scholar
Yu, L., Zhang, W., Wang, J., Yu, Y.: SeqGAN: sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, No. 1 (2017)
Google Scholar
Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281 (2017)
Ding, L., Wang, L., Liu, X., Wong, D.F., Tao, D., Tu, Z.: Rejuvenating low-frequency words: making the most of parallel data in non-autoregressive translation. arXiv preprint arXiv:2106.00903 (2021)
Ran, Q., Lin, Y., Li, P., Zhou, J.: Guiding non-autoregressive neural machine translation decoding with reordering information. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, No. 15, pp. 13727–13735 (2021)
Google Scholar
Junliang Guo, X., Tan, L.X., Qin, T., Chen, E., Liu, T.-Y.: Fine-tuning by curriculum learning for non-autoregressive neural machine translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7839–7846 (2020)
Google Scholar
Gu, J., Wang, C., Zhao, J.: Levenshtein transformer. arXiv preprint arXiv:1905.11006 (2019)
Ghazvininejad, M., Levy, O., Liu, Y., Zettlemoyer, L.: Mask-predict: parallel decoding of conditional masked language models. arXiv preprint arXiv:1904.09324 (2019)
Qian, L., et al.: Glancing transformer for non-autoregressive neural machine translation. arXiv preprint arXiv:2008.07905 (2020)
Antonios Platanios, E., Stretcu, O., Neubig, G., Poczos, B., Mitchell, T.M.: Competence-based curriculum learning for neural machine translation. arXiv preprint arXiv:1903.09848 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. In: Proceedings of NAACL-HLT 2019: Demonstrations (2019)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

National & Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian, Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, College of Computer Science, Inner Mongolia University, Hohhot, China
Yisong Wang, Hongxu Hou, Shuo Sun, Nier Wu, Weichen Jian, Zongheng Yang & Pengcong Wang

Authors

Yisong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongxu Hou
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Sun
View author publications
You can also search for this author in PubMed Google Scholar
Nier Wu
View author publications
You can also search for this author in PubMed Google Scholar
Weichen Jian
View author publications
You can also search for this author in PubMed Google Scholar
Zongheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Pengcong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxu Hou .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Tong Xiao
Meta AI, San Francisco, CA, USA
Juan Pino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y. et al. (2022). Dynamic Mask Curriculum Learning for Non-Autoregressive Neural Machine Translation. In: Xiao, T., Pino, J. (eds) Machine Translation. CCMT 2022. Communications in Computer and Information Science, vol 1671. Springer, Singapore. https://doi.org/10.1007/978-981-19-7960-6_8

Download citation

DOI: https://doi.org/10.1007/978-981-19-7960-6_8
Published: 09 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7959-0
Online ISBN: 978-981-19-7960-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics