Skip to main content
Log in

An automatic evaluation metric for Ancient-Modern Chinese translation

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

As a written language used for thousands of years, Ancient Chinese has some special characteristics like complex semantics as polysemy and the one-to-many alignment with Modern Chinese. Thus it may be translated in a large number of fully different but equally correct ways. In the absence of multiple references, reference-dependent evaluations like Bilingual Evaluation Understudy (BLEU) cannot identify potentially correct translation results. The explore on automatic evaluation of Ancient-Modern Chinese Translation is completely lacking. In this paper, we proposed an automatic evaluation metric for Ancient-Modern Chinese Translation called DTE (Dual-based Translation Evaluation), which can be used to evaluate one-to-many alignment in the absence of multiple references. When using DTE to evaluate, we found that the proper nouns often could not be correctly translated. Hence, we designed a new word segmentation method to improve the translation of proper nouns without increasing the size of the model vocabulary. Experiments show that DTE outperforms several general evaluations in terms of similarity to the evaluation of human experts. Meanwhile, the new word segmentation method promotes the Ancient-Modern Chinese translation models perform better on proper nouns’ translation, and get higher scores on both BLEU and DTE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In Experiment section, we make length statistics for Ancient-Modern Chinese corpus in Table 1. The average length of the augmented sentences is about a dozen words and the original sentences are shorter than that.

  2. In order to ensure an objective comparison, the English translations of all pictures are literal translation without modification.

  3. figure c

    ’ (elegance and delicacy) is also an evaluation requirement in this theory. Since this evaluation is very subjective and our task is to translate narrative Ancient Chinese which records facts of Chinese history, we ignored it here.

  4. Two sentences are the ancient input sentence and the retranslation sentence from the symmetrical Modern-Ancient Chinese translation model.

  5. We collected this special dictionary to include people names, place names, and some proper nouns that often appear in ancient China, containing about 6000 words.

  6. Most of the translations for classical poems are subjective, with wide variations between different versions and requiring a lot of additional explanation.

  7. A clause is a sentence that is obtained by dividing a sentence into fragments when meeting commas, semicolons, periods, exclamation marks and question marks.

  8. The code is implemented based on https://github.com/huggingface/transformers.

  9. The weights are determined by where the options are arranged. For example, we have three options for ranking, where the first position has a weight of 3, the second position has a weight of 2, and the third position has a weight of 1.

  10. Similar to the way that converting the human expert ranking results into scores, we ranked the three candidate sentences from high to low in the automatic ranking method. After that, the first one gets 3 points, the second one gets 2 points and the third one gets 1 point, thus converting the ranking results into discrete scores.

References

  1. Agarwal A, Lavie A (2008) Meteor, m-Bleu and m-Ter: evaluation metrics for high-correlation with human rankings of machine translation output. In: WMT-08, ACL

  2. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: ICLR

  3. Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluation the role of bleu in machine translation research. In: EACL

  4. Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (meta-) Evaluation of machine translation. In: WMT-07, ACL

  5. Chang PC, Galley M, Manning CD (2008) Optimizing Chinese word segmentation for machine translation performance. In: WMT-08, ACL

  6. Cheng Y, Tu Z, Meng F, Zhai J, Liu Y (2018) Towards robust neural machine translation. In: ACL

  7. Fu Z, Tan X, Peng N, Zhao D, Yan R (2018) Style transfer in text: exploration and evaluation. In: AAAI

  8. Gomaa WH, Fahmy AA (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13–18

    Google Scholar 

  9. He D, Xia Y, Qin T, Wang L, Yu N, Liu TY, Ma WY (2016) Dual learning for machine translation. In: NIPS

  10. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: EACL

  11. Lample G, Conneau A, Denoyer L, Ranzato M (2018) Unsupervised machine translation using monolingual corpora only. In: ICLR

  12. Lavie A, Agarwal A (2007) Meteor: an automatic metric for mt evaluation with high levels of correlation with human judgments. In: WMT-07, ACL

  13. Li J, Song Y, Zhang H, Chen D, Shi S, Zhao D, Yan R (2018) Generating classical Chinese poems via conditional variational autoencoder and adversarial training. In: EMNLP

  14. Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out

  15. Liu D, Fu J, Qu Q, Lv J (2018) Bfgan: backward and forward generative adversarial networks for lexically constrained sentence generation. ArXiv preprint arXiv:180608097

  16. Liu D, Yang K, Qu Q, Lv J (2019) Ancient-modern Chinese translation with a new large training dataset. In: TALLIP

  17. Liu D, Yang X, He F, Chen Y, Lv J (2019b) mu-forcing: Training variational recurrent autoencoders for text generation. ArXiv preprint arXiv:190510072

  18. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS

  19. Mitchell J, Lapata M (2008) Vector-based models of semantic composition. In: ACL

  20. Nakov P, Guzman F, Vogel S (2012) Optimizing for sentence-level bleu+ 1 yields short translations. In: COLING

  21. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: ACL

  22. Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: EMNLP

  23. Scwartz B (2009) In search of wealth and power: Yen Fu and the West. Harvard University Press, Harvard

    Book  Google Scholar 

  24. Shimanaka H, Kajiwara T, Komachi M (2019) Machine translation evaluation with BERT regressor. CoRR abs/1907.12679. arXiv:1907.12679

  25. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA

  26. Su J, Zeng J, Xiong D, Liu Y, Wang M, Xie J (2018) A hierarchy-to-sequence attentional neural machine translation model. In: TASLP

  27. Sundermeyer M, Alkhouli T, Wuebker J, Ney H (2014) Translation modeling with bidirectional recurrent neural networks. In: EMNLP

  28. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: NIPS

  29. Wieting J, Bansal M, Gimpel K, Livescu K (2016) Towards universal paraphrastic sentence embeddings. In: ICLR

  30. Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: Sequence generative adversarial nets with policy gradient. In: AAAI

  31. Zhang H, Li J, Ji Y, Yue H (2016) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Indus Inf 13(2):616–624

    Article  Google Scholar 

  32. Zhang J, Zong C (2016) Bridging neural machine translation and bilingual dictionaries. ArXiv preprint arXiv:161007272

  33. Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y (2020) Bertscore: Evaluating text generation with BERT. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020, OpenReview.net. https://openreview.net/forum?id=SkeHuCVFDr

  34. Zhang WX, Qiu LK, Song ZY, Chen B (2012) Corpus-based quantitative analysis on stylistic difference of chinese synonyms. Chin Lang Learn 3:72–80

    Google Scholar 

  35. Zhang Z, Li W, Sun X (2018) Automatic transferring between ancient chinese and contemporary chinese. ArXiv preprint arXiv:180301557

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China Grant No. 2017YFB1002201, the National Natural Science Fund for Distinguished Young Scholar Grant No. 61625204, and partially supported by the Sichuan Science and Technology Major Project Grant No. 2018GZDZX0028.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongsheng Sang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, K., Liu, D., Qu, Q. et al. An automatic evaluation metric for Ancient-Modern Chinese translation. Neural Comput & Applic 33, 3855–3867 (2021). https://doi.org/10.1007/s00521-020-05216-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05216-8

Keywords

Navigation