Adaptive Adapters: An Efficient Way to Incorporate BERT Into Neural Machine Translation

被引:13
作者
Guo, Junliang [1 ]
Zhang, Zhirui [2 ]
Xu, Linli [1 ,3 ]
Chen, Boxing [2 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
[2] Alibaba Damo Acad, Hangzhou 310052, Peoples R China
[3] IFLYTEK Co Ltd, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptation models; Bit error rate; Task analysis; Decoding; Machine translation; Natural languages; Training; Pre-trained language model; adapter; neural machine translation;
D O I
10.1109/TASLP.2021.3076863
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large-scale pre-trained language models (e.g., BERT) have attracted great attention in recent years. It is straightforward to fine-tune them on natural language understanding tasks such as text classification, however, effectively and efficiently incorporating them into natural language generation tasks such as neural machine translation remains a challenging problem. In this paper, we integrate two pre-trained BERT models from the source and target language domains into a sequence-to-sequence model by introducing light-weight adapter modules. The adapters are inserted between BERT layers and tuned on downstream tasks, while the parameters of BERT models are fixed during fine-tuning. As pre-trained language models are usually very deep, inserting adapters into all layers will result in a considerable scale of new parameters. To deal with this problem, we introduce latent variables to decide whether using adapters or not in each layer, which are learned during fine-tuning. In this way, the model is able to automatically determine which adapters to use, therefore hugely promoting the parameter efficiency and decoding speed. We evaluate the proposed framework on various neural machine translation tasks. Equipped with parallel sequence decoding, our model consistently outperforms autoregressive baselines while reducing the inference latency by half. With automatic adapter selection, the proposed model further achieves 20% speedup while still outperforming autoregressive baselines. When applied to autoregressive decoding, the proposed model can also achieve comparable performance with the state-of-the-art baseline models.
引用
收藏
页码:1740 / 1751
页数:12
相关论文
共 50 条
  • [31] Exploring Multi-Stage Information Interactions for Multi-Source Neural Machine Translation
    Lu, Ziyao
    Li, Xiang
    Liu, Yang
    Zhou, Chulun
    Cui, Jianwei
    Wang, Bin
    Zhang, Min
    Su, Jinsong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 562 - 570
  • [32] BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation
    Dai, Yuqian
    De Kamps, Marc
    Sharoff, Serge
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6674 - 6690
  • [33] Neural Name Translation Improves Neural Machine Translation
    Li, Xiaoqing
    Yan, Jinghui
    Zhang, Jiajun
    Zong, Chengqing
    MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 93 - 100
  • [34] Analysing terminology translation errors in statistical and neural machine translation
    Haque, Rejwanul
    Hasanuzzaman, Mohammed
    Way, Andy
    MACHINE TRANSLATION, 2020, 34 (2-3) : 149 - 195
  • [35] The Event/Machine of Neural Machine Translation?
    Regnauld, Arnaud
    JOURNAL OF AESTHETICS AND PHENOMENOLOGY, 2022, 9 (02) : 141 - 154
  • [36] Exploring Discriminative Word-Level Domain Contexts for Multi-Domain Neural Machine Translation
    Su, Jinsong
    Zeng, Jiali
    Xie, Jun
    Wen, Huating
    Yin, Yongjing
    Liu, Yang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) : 1530 - 1545
  • [37] Efficient Methods for Mapping Neural Machine Translator on FPGAs
    Li, Qin
    Zhang, Xiaofan
    Xiong, Jinjun
    Hwu, Wen-Mei
    Chen, Deming
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (07) : 1866 - 1877
  • [38] A Novel Sentence-Level Agreement Architecture for Neural Machine Translation
    Yang, Mingming
    Wang, Rui
    Chen, Kehai
    Wang, Xing
    Zhao, Tiejun
    Zhang, Min
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2585 - 2597
  • [39] Attending From Foresight: A Novel Attention Mechanism for Neural Machine Translation
    Li, Xintong
    Liu, Lemao
    Tu, Zhaopeng
    Li, Guanlin
    Shi, Shuming
    Meng, Max Q. -H.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2606 - 2616
  • [40] Neural Machine Translation With Noisy Lexical Constraints
    Li, Huayang
    Huang, Guoping
    Cai, Deng
    Liu, Lemao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1864 - 1874