Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

被引:0
作者
Wang, Wenxuan [1 ]
Jiao, Wenxiang [2 ]
Hao, Yongchang [3 ]
Wang, Xing [2 ]
Shi, Shuming [2 ]
Tu, Zhaopeng [2 ]
Lyu, Michael R. [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Tencent AI Lab, Bellevue, WA 98004 USA
[3] Univ Alberta, Edmonton, AB, Canada
来源
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation (NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoderbased pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.
引用
收藏
页码:2591 / 2600
页数:10
相关论文
共 50 条
  • [21] Sequence-to-Dependency Neural Machine Translation
    Wu, Shuangzhi
    Zhang, Dongdong
    Yang, Nan
    Li, Mu
    Zhou, Ming
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 698 - 707
  • [22] EFFECT OF DATA REDUCTION ON SEQUENCE-TO-SEQUENCE NEURAL TTS
    Latorre, Javier
    Lachowicz, Jakub
    Lorenzo-Trueba, Jaime
    Merritt, Thomas
    Drugman, Thomas
    Ronanki, Srikanth
    Klimkov, Viacheslav
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7075 - 7079
  • [23] Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
    Zhou, Wangchunshu
    Ge, Tao
    Xu, Canwen
    Xu, Ke
    Wei, Furu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 571 - 582
  • [24] Prediction Model Design for Vibration Severity of Rotating Machine Based on Sequence-to-Sequence Neural Network
    Wang, Zhiqiang
    Qian, Hong
    Zhang, Dongliang
    Wei, Yingchen
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
  • [25] INVESTIGATION OF AN INPUT SEQUENCE ON THAI NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS
    Janyoi, Pongsathon
    Thangthai, Ausdang
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 218 - 223
  • [26] "Found in Translation": predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
    Schwaller, Philippe
    Gaudin, Theophile
    Lanyi, David
    Bekas, Costas
    Laino, Teodoro
    CHEMICAL SCIENCE, 2018, 9 (28) : 6091 - 6098
  • [27] Improving AMR Parsing with Sequence-to-Sequence Pre-training
    Xu, Dongqin
    Li, Junhui
    Zhu, Muhua
    Min Zhang
    Zhou, Guodong
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2501 - 2511
  • [28] Understanding Subtitles by Character-Level Sequence-to-Sequence Learning
    Zhang, Haijun
    Li, Jingxuan
    Ji, Yuzhu
    Yue, Heng
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (02) : 616 - 624
  • [29] Improving Sequence-to-sequence Tibetan Speech Synthesis with Prosodic Information
    Zhang, Weizhao
    Yang, Hongwu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [30] De-duping URLs with Sequence-to-Sequence Neural Networks
    Xu, Keyang
    Liu, Zhengzhong
    Callan, Jamie
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1157 - 1160