AN EMPIRICAL STUDY OF TRANSFORMER-BASED NEURAL LANGUAGE MODEL ADAPTATION

被引:0
|
作者
Li, Ke [1 ,2 ]
Liu, Zhe [1 ]
He, Tianxing [3 ]
Huang, Hongzhao [1 ]
Peng, Fuchun [1 ]
Povey, Daniel
Khudanpur, Sanjeev [2 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[3] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
neural language model; language model adaptation; Transformer; linear interpolation; automatic speech recognition;
D O I
10.1109/icassp40776.2020.9053399
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. The first approach is a pretrain-finetune framework, where we first pretrain a Transformer LM on a large-scale text corpus from scratch and then adapt it to relatively small target domains via finetuning. The second approach is a mixer of dynamically weighted models that are separately trained on source and target domains, aiming to improve simple linear interpolation with dynamic weighting. We compare the two approaches with three baselines - without adaptation, merging data, and simple interpolation - on Switchboard (SWBD) and Wall Street Journal (WSJ). Experiments show that the mixer model generally performs better than baselines and finetuning. Compared with no adaptation, finetuning and the mixer approach obtain up to relative 11.5% and 14.1% WER reductions on SWBD, respectively. The mixer model also outperforms linear interpolation and merging data. On WSJ, the mixer approach achieves a new state-of-the-art WER result.
引用
收藏
页码:7934 / 7938
页数:5
相关论文
共 50 条
  • [41] Ouroboros: On Accelerating Training of Transformer-Based Language Models
    Yang, Qian
    Huo, Zhouyuan
    Wang, Wenlin
    Huang, Heng
    Carin, Lawrence
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [42] Transformer-Based Language Models for Software Vulnerability Detection
    Thapa, Chandra
    Jang, Seung Ick
    Ahmed, Muhammad Ejaz
    Camtepe, Seyit
    Pieprzyk, Josef
    Nepal, Surya
    PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 481 - 496
  • [43] A Comparison of Transformer-Based Language Models on NLP Benchmarks
    Greco, Candida Maria
    Tagarelli, Andrea
    Zumpano, Ester
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 490 - 501
  • [44] Is Transformer-Based Attention Agnostic of the Pretraining Language and Task?
    Martin, R. H. J.
    Visser, R.
    Dunaiski, M.
    SOUTH AFRICAN COMPUTER SCIENCE AND INFORMATION SYSTEMS RESEARCH TRENDS, SAICSIT 2024, 2024, 2159 : 95 - 123
  • [45] RadBERT: Adapting Transformer-based Language Models to Radiology
    Yan, An
    McAuley, Julian
    Lu, Xing
    Du, Jiang
    Chang, Eric Y.
    Gentili, Amilcare
    Hsu, Chun-Nan
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (04)
  • [46] Applications of transformer-based language models in bioinformatics: a survey
    Zhang, Shuang
    Fan, Rui
    Liu, Yuti
    Chen, Shuang
    Liu, Qiao
    Zeng, Wanwen
    NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)
  • [47] TAG: Gradient Attack on Transformer-based Language Models
    Deng, Jieren
    Wang, Yijue
    Li, Ji
    Wang, Chenghong
    Shang, Chao
    Liu, Hang
    Rajasekaran, Sanguthevar
    Ding, Caiwen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3600 - 3610
  • [48] Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation
    Hommel, Bjoern E.
    Wollang, Franz-Josef M.
    Kotova, Veronika
    Zacher, Hannes
    Schmukle, Stefan C.
    PSYCHOMETRIKA, 2022, 87 (02) : 749 - 772
  • [49] Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation
    Björn E. Hommel
    Franz-Josef M. Wollang
    Veronika Kotova
    Hannes Zacher
    Stefan C. Schmukle
    Psychometrika, 2022, 87 : 749 - 772
  • [50] A Comparative Study of Transformer-based Neural Text Representation Techniques on Bug Triaging
    Dipongkor, Atish Kumar
    Moran, Kevin
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1012 - 1023