Skip to main content

Dynamic Mask Curriculum Learning for Non-Autoregressive Neural Machine Translation

  • Conference paper
  • First Online:
Machine Translation (CCMT 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1671))

Included in the following conference series:

  • 222 Accesses

Abstract

Non-autoregressive neural machine translation is gradually becoming a research hotspot due to its advantages of fast decoding. However, the increase of decoding speed is often accompanied by the loss of model performance. The main reason is that the target language information obtained at the decoder side is insufficient, and the mandatory parallel decoding leads to a large number of mistranslation and missing translation problems. In order to solve the problem of insufficient target language information, this paper proposes a dynamic mask curriculum learning approach to provide target side language information to the model. The target side self-attention layer is added in the pre-training phase to capture the target side information and adjust the amount of information input at any time by way of curriculum learning. The fine-tuning and inference phases disable the module in the same way as the normal NAT model. In this paper, we experiment on two translation datasets of WMT16, and the BLEU improvement reaches 4.4 without speed reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  2. Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)

    Google Scholar 

  3. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp. 1243–1252. PMLR (2017)

    Google Scholar 

  4. Yu, L., Zhang, W., Wang, J., Yu, Y.: SeqGAN: sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, No. 1 (2017)

    Google Scholar 

  5. Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281 (2017)

  6. Ding, L., Wang, L., Liu, X., Wong, D.F., Tao, D., Tu, Z.: Rejuvenating low-frequency words: making the most of parallel data in non-autoregressive translation. arXiv preprint arXiv:2106.00903 (2021)

  7. Ran, Q., Lin, Y., Li, P., Zhou, J.: Guiding non-autoregressive neural machine translation decoding with reordering information. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, No. 15, pp. 13727–13735 (2021)

    Google Scholar 

  8. Junliang Guo, X., Tan, L.X., Qin, T., Chen, E., Liu, T.-Y.: Fine-tuning by curriculum learning for non-autoregressive neural machine translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7839–7846 (2020)

    Google Scholar 

  9. Gu, J., Wang, C., Zhao, J.: Levenshtein transformer. arXiv preprint arXiv:1905.11006 (2019)

  10. Ghazvininejad, M., Levy, O., Liu, Y., Zettlemoyer, L.: Mask-predict: parallel decoding of conditional masked language models. arXiv preprint arXiv:1904.09324 (2019)

  11. Qian, L., et al.: Glancing transformer for non-autoregressive neural machine translation. arXiv preprint arXiv:2008.07905 (2020)

  12. Antonios Platanios, E., Stretcu, O., Neubig, G., Poczos, B., Mitchell, T.M.: Competence-based curriculum learning for neural machine translation. arXiv preprint arXiv:1903.09848 (2019)

  13. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  14. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)

  15. Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. In: Proceedings of NAACL-HLT 2019: Demonstrations (2019)

    Google Scholar 

  16. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxu Hou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y. et al. (2022). Dynamic Mask Curriculum Learning for Non-Autoregressive Neural Machine Translation. In: Xiao, T., Pino, J. (eds) Machine Translation. CCMT 2022. Communications in Computer and Information Science, vol 1671. Springer, Singapore. https://doi.org/10.1007/978-981-19-7960-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-7960-6_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-7959-0

  • Online ISBN: 978-981-19-7960-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics