Adaptive Adapters: An Efficient Way to Incorporate BERT Into Neural Machine Translation

被引:13
作者
Guo, Junliang [1 ]
Zhang, Zhirui [2 ]
Xu, Linli [1 ,3 ]
Chen, Boxing [2 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei 230026, Peoples R China
[2] Alibaba Damo Acad, Hangzhou 310052, Peoples R China
[3] IFLYTEK Co Ltd, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptation models; Bit error rate; Task analysis; Decoding; Machine translation; Natural languages; Training; Pre-trained language model; adapter; neural machine translation;
D O I
10.1109/TASLP.2021.3076863
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large-scale pre-trained language models (e.g., BERT) have attracted great attention in recent years. It is straightforward to fine-tune them on natural language understanding tasks such as text classification, however, effectively and efficiently incorporating them into natural language generation tasks such as neural machine translation remains a challenging problem. In this paper, we integrate two pre-trained BERT models from the source and target language domains into a sequence-to-sequence model by introducing light-weight adapter modules. The adapters are inserted between BERT layers and tuned on downstream tasks, while the parameters of BERT models are fixed during fine-tuning. As pre-trained language models are usually very deep, inserting adapters into all layers will result in a considerable scale of new parameters. To deal with this problem, we introduce latent variables to decide whether using adapters or not in each layer, which are learned during fine-tuning. In this way, the model is able to automatically determine which adapters to use, therefore hugely promoting the parameter efficiency and decoding speed. We evaluate the proposed framework on various neural machine translation tasks. Equipped with parallel sequence decoding, our model consistently outperforms autoregressive baselines while reducing the inference latency by half. With automatic adapter selection, the proposed model further achieves 20% speedup while still outperforming autoregressive baselines. When applied to autoregressive decoding, the proposed model can also achieve comparable performance with the state-of-the-art baseline models.
引用
收藏
页码:1740 / 1751
页数:12
相关论文
共 50 条
  • [41] Preventing translation quality deterioration caused by beam search decoding in neural machine translation using statistical machine translation
    Satir, Emre
    Bulut, Hasan
    INFORMATION SCIENCES, 2021, 581 : 791 - 807
  • [42] Risks in neural machine translation
    Canfora, Carmen
    Ottmann, Angelika
    TRANSLATION SPACES, 2020, 9 (01) : 58 - 77
  • [43] Interactive neural machine translation
    Peris, Alvaro
    Domingo, Miguel
    Casacuberta, Francisco
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 201 - 220
  • [44] EXPLICITATION IN NEURAL MACHINE TRANSLATION
    Krueger, Ralph
    ACROSS LANGUAGES AND CULTURES, 2020, 21 (02) : 195 - 216
  • [45] Neural machine translation for Hungarian
    Laki, Laszlo Janos
    Yang, Zijian Gyozo
    ACTA LINGUISTICA ACADEMICA, 2022, 69 (04): : 501 - 520
  • [46] Efficient Low-Resource Neural Machine Translation with Reread and Feedback Mechanism
    Yu, Zhiqiang
    Yu, Zhengtao
    Guo, Junjun
    Huang, Yuxin
    Wen, Yonghua
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (03)
  • [47] Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System
    Zamora-Martinez, Francisco
    Jose Castro-Bleda, Maria
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2018, 28 (09)
  • [48] Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation
    Chen, Qi
    Kwong, Oi Yee
    Li, Yinqiao
    Xiao, Tong
    Zhu, Jingbo
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (06)
  • [49] Analysis of Rule-Based Machine Translation and Neural Machine Translation Approaches for Translating Portuguese to LIBRAS
    Moraes de Oliveira, Caio Cesar
    do Rego, Thais Gaudencio
    Cavalcanti Brandao Lima, Manuella Aschoff
    Ugulino de Araujo, Tiago Maritan
    WEBMEDIA 2019: PROCEEDINGS OF THE 25TH BRAZILLIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2019, : 117 - 124
  • [50] Dynamic Multi-Branch Layers for On-Device Neural Machine Translation
    Tan, Zhixing
    Yang, Zeyuan
    Zhang, Meng
    Liu, Qun
    Sun, Maosong
    Liu, Yang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 958 - 967