AN EMPIRICAL STUDY OF TRANSFORMER-BASED NEURAL LANGUAGE MODEL ADAPTATION

被引:0
|
作者
Li, Ke [1 ,2 ]
Liu, Zhe [1 ]
He, Tianxing [3 ]
Huang, Hongzhao [1 ]
Peng, Fuchun [1 ]
Povey, Daniel
Khudanpur, Sanjeev [2 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[3] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
neural language model; language model adaptation; Transformer; linear interpolation; automatic speech recognition;
D O I
10.1109/icassp40776.2020.9053399
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. The first approach is a pretrain-finetune framework, where we first pretrain a Transformer LM on a large-scale text corpus from scratch and then adapt it to relatively small target domains via finetuning. The second approach is a mixer of dynamically weighted models that are separately trained on source and target domains, aiming to improve simple linear interpolation with dynamic weighting. We compare the two approaches with three baselines - without adaptation, merging data, and simple interpolation - on Switchboard (SWBD) and Wall Street Journal (WSJ). Experiments show that the mixer model generally performs better than baselines and finetuning. Compared with no adaptation, finetuning and the mixer approach obtain up to relative 11.5% and 14.1% WER reductions on SWBD, respectively. The mixer model also outperforms linear interpolation and merging data. On WSJ, the mixer approach achieves a new state-of-the-art WER result.
引用
收藏
页码:7934 / 7938
页数:5
相关论文
共 50 条
  • [21] A Transformer-Based Math Language Model for Handwritten Math Expression Recognition
    Huy Quang Ung
    Cuong Tuan Nguyen
    Hung Tuan Nguyen
    Thanh-Nghia Truong
    Nakagawa, Masaki
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 403 - 415
  • [22] An Empirical Study of Code Smells in Transformer-based Code Generation Techniques
    Siddiq, Mohammed Latif
    Majumder, Shafayat H.
    Mim, Maisha R.
    Jajodia, Sourov
    Santos, Joanna C. S.
    2022 IEEE 22ND INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2022), 2022, : 71 - 82
  • [23] A novel transformer-based neural network model for tool wear estimation
    Liu, Hui
    Liu, Zhenyu
    Jia, Weiqiang
    Lin, Xianke
    Zhang, Shuo
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2020, 31 (06)
  • [24] Transformer-Based Music Language Modelling and Transcription
    Zonios, Christos
    Pavlopoulos, John
    Likas, Aristidis
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [25] Transformer-based Natural Language Understanding and Generation
    Zhang, Feng
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284
  • [26] A Transformer-based Neural Architecture Search Method
    Wang, Shang
    Tang, Huanrong
    Ouyang, Jianquan
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 691 - 694
  • [27] A transformer-based neural ODE for dense prediction
    Seyedalireza Khoshsirat
    Chandra Kambhamettu
    Machine Vision and Applications, 2023, 34
  • [28] BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks
    Oh, Jong-Hoon
    Iida, Ryu
    Kloetzer, Julien
    Torisawa, Kentaro
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2103 - 2115
  • [29] A transformer-based neural ODE for dense prediction
    Khoshsirat, Seyedalireza
    Kambhamettu, Chandra
    MACHINE VISION AND APPLICATIONS, 2023, 34 (06)
  • [30] Privacy Protection in Transformer-based Neural Network
    Lang, Jiaqi
    Li, Linjing
    Chen, Weiyun
    Zeng, Daniel
    2019 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2019, : 182 - 184