Neural Machine Translation Based on XLM-R Cross-lingual Pre-training Language Model

被引:0
作者
Wang Q. [1 ]
Li M. [1 ]
Wu S. [1 ]
Wang M. [1 ]
机构
[1] School of Computer and Information Engineering, Jiangxi Normal University, Nanchang
来源
Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis | 2022年 / 58卷 / 01期
关键词
Cross-lingual pre-training language model; Fine-tuning; Neural machine translation; Transformer neural network; XLM-R model;
D O I
10.13209/j.0479-8023.2021.109
中图分类号
学科分类号
摘要
The authors explore the application of XLM-R cross-lingual pre-training language model into the source language, into the target language and into both of them to improve the quality of machine translation, and propose three neural network models, which integrate pre-trained XLM-R multilingual word representation into the Transformer encoder, into the Transformer decoder and into both of them respectively. The experimental results on WMT English-German, IWSLT English-Portuguese and English-Vietnamese machine translation benchmarks show that integrating XLM-R model into Transformer encoder can effectively encode the source sentences and improve the system performance for resource-rich translation task. For resource-poor translation task, integrating XLM-R model can not only encode the source sentences well, but also supplement the source language knowledge and target language knowledge at the same time, thus improve the translation quality. © 2022 Peking University.
引用
收藏
页码:29 / 36
页数:7
相关论文
共 22 条
  • [1] Peters M, Neumann M, Iyyer M, Et al., Deep contex-tualized word representations, Proceedings of the NAACL-HLT, pp. 2227-2237, (2018)
  • [2] Devlin J, Chang M W, Lee K, Et al., BERT: pre-training of deep bidirectional transformers for lang-uage understanding, Proceedings of the NAACL-HLT, pp. 4171-4186, (2019)
  • [3] Radford A, Narasimhan K, Salimans T, Et al., Improv-ing language understanding by generative pre-training [R/OL], (2018)
  • [4] Brown T B, Mann B, Ryder N, Et al., Language models are few-shot learners, Proceedings of the NeurIPS, pp. 1877-1901, (2020)
  • [5] Imamura K, Sumita E., Recycling a pre-trained BERT encoder for neural machine translation, Proceedings of the EMNLP & NGT, pp. 23-31, (2019)
  • [6] Kim Y, Rush A M., Sequence-level knowledge distilla-tion, Proceedings of the EMNLP, pp. 1317-1327, (2016)
  • [7] Hinton G, Vinyals O, Dean J., Distilling the know-ledge in a neural network [EB/OL]
  • [8] Weng R, Yu H, Huang S, Et al., Acquiring knowledge from pre-trained model to neural machine translation, Proceedings of the AAAI, pp. 9266-9273, (2020)
  • [9] Yang J, Wang M, Zhou H, Et al., Towards making the most of bert in neural machine translation, Procee-dings of the AAAI, pp. 9378-9385, (2020)
  • [10] Chen Y C, Gan Z, Cheng Y, Et al., Distilling know-ledge learned in BERT for text generation, Procee-dings of the ACL, pp. 7893-7905, (2020)