Adaptive Adapters: An Efficient Way to Incorporate BERT Into Neural Machine Translation

被引：13

作者：

Guo, Junliang ^{[1
]}

Zhang, Zhirui ^{[2
]}

Xu, Linli ^{[1
,3
]}

Chen, Boxing ^{[2
]}

Chen, Enhong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China

[2] Alibaba Damo Acad, Hangzhou 310052, Peoples R China

[3] IFLYTEK Co Ltd, Hefei, Anhui, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

基金：

中国国家自然科学基金;

关键词：

Adaptation models; Bit error rate; Task analysis; Decoding; Machine translation; Natural languages; Training; Pre-trained language model; adapter; neural machine translation;

D O I：

10.1109/TASLP.2021.3076863

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Large-scale pre-trained language models (e.g., BERT) have attracted great attention in recent years. It is straightforward to fine-tune them on natural language understanding tasks such as text classification, however, effectively and efficiently incorporating them into natural language generation tasks such as neural machine translation remains a challenging problem. In this paper, we integrate two pre-trained BERT models from the source and target language domains into a sequence-to-sequence model by introducing light-weight adapter modules. The adapters are inserted between BERT layers and tuned on downstream tasks, while the parameters of BERT models are fixed during fine-tuning. As pre-trained language models are usually very deep, inserting adapters into all layers will result in a considerable scale of new parameters. To deal with this problem, we introduce latent variables to decide whether using adapters or not in each layer, which are learned during fine-tuning. In this way, the model is able to automatically determine which adapters to use, therefore hugely promoting the parameter efficiency and decoding speed. We evaluate the proposed framework on various neural machine translation tasks. Equipped with parallel sequence decoding, our model consistently outperforms autoregressive baselines while reducing the inference latency by half. With automatic adapter selection, the proposed model further achieves 20% speedup while still outperforming autoregressive baselines. When applied to autoregressive decoding, the proposed model can also achieve comparable performance with the state-of-the-art baseline models.

引用

页码：1740 / 1751

页数：12

共 50 条

[21] Survey on Neural Machine Translation for multilingual translation system
Basmatkar, Pranjali
Holani, Hemant
Kaushal, Shivani
PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 443 - 448
[22] Augmenting Training Data for Low-Resource Neural Machine Translation via Bilingual Word Embeddings and BERT Language Modelling
Ramesh, Akshai
Uhana, Haque Usuf
Parthasarathy, Venkatesh Balavadhani
Haque, Rejwanul
Way, Andy
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[23] Improvements of Google Neural Machine Translation
李瑞
蒋美佳
海外英语, 2017, (15) : 132 - 134
[24] Neural Machine Translation with Deep Attention
Zhang, Biao
Xiong, Deyi
Su, Jinsong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 154 - 163
[25] Neural Machine Translation for Indian Languages
Pathak, Amarnath
Pakray, Partha
JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 465 - 477
[26] Future-Aware Knowledge Distillation for Neural Machine Translation
Zhang, Biao
Xiong, Deyi
Su, Jinsong
Luo, Jiebo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2278 - 2287
[27] Neural Machine Translation With GRU-Gated Attention Model
Zhang, Biao
Xiong, Deyi
Xie, Jun
Su, Jinsong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4688 - 4698
[28] BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation
Dai, Yuqian
De Kamps, Marc
Sharoff, Serge
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6674 - 6690
[29] Neural Machine Translation as a Novel Approach to Machine Translation
Benkova, Lucia
Benko, Lubomir
DIVAI 2020: 13TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2020, : 499 - 508
[30] CONTEXT-ADAPTIVE DOCUMENT-LEVEL NEURAL MACHINE TRANSLATION
Zhang, Linlin
Zhang, Zhirui
Chen, Boxing
Luo, Weihua
Si, Luo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6232 - 6236

← 1 2 3 4 5 →