Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

被引：0

作者：

Wang, Wenxuan ^{[1
]}

Jiao, Wenxiang ^{[2
]}

Hao, Yongchang ^{[3
]}

Wang, Xing ^{[2
]}

Shi, Shuming ^{[2
]}

Tu, Zhaopeng ^{[2
]}

Lyu, Michael R. ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Tencent AI Lab, Bellevue, WA 98004 USA

[3] Univ Alberta, Edmonton, AB, Canada

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation (NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoderbased pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.

引用

页码：2591 / 2600

页数：10

共 50 条

[41] Graph augmented sequence-to-sequence model for neural question generation [J].

Hui Ma ;

Jian Wang ;

Hongfei Lin ;

Bo Xu .

Applied Intelligence, 2023, 53 :14628-14644

[42] Sequence-to-Sequence Neural Diarization With Automatic Speaker Detection and Representation [J].

Cheng, Ming ;

Lin, Yuke ;

Li, Ming .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2025, 33 :2719-2734

[43] ON SEQUENCE-TO-SEQUENCE TRANSFORMATIONS [J].

UPRETI, R .

INDIAN JOURNAL OF PURE & APPLIED MATHEMATICS, 1982, 13 (04) :454-457

[44] Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining [J].

Huang, Wen-Chin ;

Hayashi, Tomoki ;

Wu, Yi-Chiao ;

Kameoka, Hirokazu ;

Toda, Tomoki .

INTERSPEECH 2020, 2020, :4676-4680

[45] IMPROVING SEQUENCE-TO-SEQUENCE VOICE CONVERSION BY ADDING TEXT-SUPERVISION [J].

Zhang, Jing-Xuan ;

Ling, Zhen-Hua ;

Jiang, Yuan ;

Liu, Li-Juan ;

Liang, Chen ;

Dai, Li-Rong .

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, :6785-6789

[46] Tree-to-Sequence Attentional Neural Machine Translation [J].

Eriguchi, Akiko ;

Hashimoto, Kazuma ;

Tsuruoka, Yoshimasa .

PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, :823-833

[47] Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition and Constraints [J].

Lou, Chao ;

Tu, Kewei .

61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, :1918-1929

[48] MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models [J].

Pan, Boyuan ;

Yang, Yazheng ;

Li, Hao ;

Zhao, Zhou ;

Zhuang, Yueting ;

Cai, Deng ;

He, Xiaofei .

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

[49] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition [J].

Ueno, Sei ;

Mimura, Masato ;

Sakai, Shinsuke ;

Kawahara, Tatsuya .

ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (06) :333-343

[50] Understanding and Improving Hidden Representation for Neural Machine Translation [J].

Li, Guanlin ;

Liu, Lemao ;

Li, Xintong ;

Zhu, Conghui ;

Zhao, Tiejun ;

Shi, Shuming .

2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, :466-477

← 1 2 3 4 5 →