Adaptive Adapters: An Efficient Way to Incorporate BERT Into Neural Machine Translation

被引:13
|
作者
Guo, Junliang [1 ]
Zhang, Zhirui [2 ]
Xu, Linli [1 ,3 ]
Chen, Boxing [2 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
[2] Alibaba Damo Acad, Hangzhou 310052, Peoples R China
[3] IFLYTEK Co Ltd, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptation models; Bit error rate; Task analysis; Decoding; Machine translation; Natural languages; Training; Pre-trained language model; adapter; neural machine translation;
D O I
10.1109/TASLP.2021.3076863
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large-scale pre-trained language models (e.g., BERT) have attracted great attention in recent years. It is straightforward to fine-tune them on natural language understanding tasks such as text classification, however, effectively and efficiently incorporating them into natural language generation tasks such as neural machine translation remains a challenging problem. In this paper, we integrate two pre-trained BERT models from the source and target language domains into a sequence-to-sequence model by introducing light-weight adapter modules. The adapters are inserted between BERT layers and tuned on downstream tasks, while the parameters of BERT models are fixed during fine-tuning. As pre-trained language models are usually very deep, inserting adapters into all layers will result in a considerable scale of new parameters. To deal with this problem, we introduce latent variables to decide whether using adapters or not in each layer, which are learned during fine-tuning. In this way, the model is able to automatically determine which adapters to use, therefore hugely promoting the parameter efficiency and decoding speed. We evaluate the proposed framework on various neural machine translation tasks. Equipped with parallel sequence decoding, our model consistently outperforms autoregressive baselines while reducing the inference latency by half. With automatic adapter selection, the proposed model further achieves 20% speedup while still outperforming autoregressive baselines. When applied to autoregressive decoding, the proposed model can also achieve comparable performance with the state-of-the-art baseline models.
引用
收藏
页码:1740 / 1751
页数:12
相关论文
共 50 条
  • [1] An Empirical Study on Automatic Post Editing for Neural Machine Translation
    Moon, Hyeonseok
    Park, Chanjun
    Eo, Sugyeong
    Seo, Jaehyung
    Lim, Heuiseok
    IEEE ACCESS, 2021, 9 : 123754 - 123763
  • [2] BERT-JAM: Maximizing the utilization of BERT for neural machine translation
    Zhang, Zhebin
    Wu, Sai
    Jiang, Dawei
    Chen, Gang
    NEUROCOMPUTING, 2021, 460 : 84 - 94
  • [3] Integrating Prior Translation Knowledge Into Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 330 - 339
  • [4] A study of BERT for context-aware neural machine translation
    Xueqing Wu
    Yingce Xia
    Jinhua Zhu
    Lijun Wu
    Shufang Xie
    Tao Qin
    Machine Learning, 2022, 111 : 917 - 935
  • [5] GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation
    Yang, Jian
    Yin, Yuwei
    Yang, Liqun
    Ma, Shuming
    Huang, Haoyang
    Zhang, Dongdong
    Wei, Furu
    Li, Zhoujun
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1489 - 1498
  • [6] A study of BERT for context-aware neural machine translation
    Wu, Xueqing
    Xia, Yingce
    Zhu, Jinhua
    Wu, Lijun
    Xie, Shufang
    Qin, Tao
    MACHINE LEARNING, 2022, 111 (03) : 917 - 935
  • [7] Neural Machine Translation Transfer Model Based on Mutual Domain Guidance
    Liu, Yupeng
    Zhang, Lei
    Zhang, Yanan
    IEEE ACCESS, 2022, 10 : 101595 - 101608
  • [8] A Smaller and Better Word Embedding for Neural Machine Translation
    Chen, Qi
    IEEE ACCESS, 2023, 11 : 40770 - 40778
  • [9] Multi-way, multilingual neural machine translation
    Firat, Orhan
    Cho, Kyunghyun
    Sankaran, Baskaran
    Vural, Fatos T. Yarman
    Bengio, Yoshua
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 236 - 252
  • [10] Alleviating Exposure Bias for Neural Machine Translation via Contextual Augmentation and Self Distillation
    Liu, Zhidong
    Li, Junhui
    Zhu, Muhua
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2079 - 2089