Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

被引:0
|
作者
Wang, Wenxuan [1 ]
Jiao, Wenxiang [2 ]
Hao, Yongchang [3 ]
Wang, Xing [2 ]
Shi, Shuming [2 ]
Tu, Zhaopeng [2 ]
Lyu, Michael R. [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Tencent AI Lab, Bellevue, WA 98004 USA
[3] Univ Alberta, Edmonton, AB, Canada
来源
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation (NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoderbased pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.
引用
收藏
页码:2591 / 2600
页数:10
相关论文
共 50 条
  • [1] In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model
    Tian, Yanzhi
    Li, Xiang
    Liu, Zeming
    Guo, Yuhang
    Wang, Bin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15046 - 15057
  • [2] Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
    Zheng, Zaixiang
    Zhou, Hao
    Huang, Shujian
    Chen, Jiajun
    Xu, Jingjing
    Li, Lei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation
    Guo, Junliang
    Xu, Linli
    Chen, Enhong
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 376 - 385
  • [4] Pretraining Techniques for Sequence-to-Sequence Voice Conversion
    Huang, Wen-Chin
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kameoka, Hirokazu
    Toda, Tomoki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 745 - 755
  • [5] Sequence-to-sequence pretraining for a less-resourced Slovenian language
    Ulcar, Matej
    Robnik-Sikonja, Marko
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
  • [6] PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining
    Reid, Machel
    Artetxe, Mikel
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 800 - 810
  • [7] Improving Sequence-to-Sequence Constituency Parsing
    Liu, Lemao
    Zhu, Muhua
    Shi, Shuming
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4873 - 4880
  • [8] Sequence-to-Sequence Models for Emphasis Speech Translation
    Quoc Truong Do
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1873 - 1883
  • [9] E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
    Zhong, Qihuang
    Ding, Liang
    Liu, Juhua
    Du, Bo
    Tao, Dacheng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8037 - 8050
  • [10] A Study on the Use of Sequence-to-Sequence Neural Networks for Automatic Translation of Brazilian Portuguese to LIBRAS
    Verissimo, Vinicius
    Silva, Cecilia
    Hanael, Vitor
    Moraes, Caio
    Costa, Rostand
    Maritan, Tiago
    Aschoff, Manuella
    Gaudencio, Thais
    WEBMEDIA 2019: PROCEEDINGS OF THE 25TH BRAZILLIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2019, : 101 - 108