AN EMPIRICAL STUDY OF TRANSFORMER-BASED NEURAL LANGUAGE MODEL ADAPTATION

被引:0
作者
Li, Ke [1 ,2 ]
Liu, Zhe [1 ]
He, Tianxing [3 ]
Huang, Hongzhao [1 ]
Peng, Fuchun [1 ]
Povey, Daniel
Khudanpur, Sanjeev [2 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[3] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
neural language model; language model adaptation; Transformer; linear interpolation; automatic speech recognition;
D O I
10.1109/icassp40776.2020.9053399
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. The first approach is a pretrain-finetune framework, where we first pretrain a Transformer LM on a large-scale text corpus from scratch and then adapt it to relatively small target domains via finetuning. The second approach is a mixer of dynamically weighted models that are separately trained on source and target domains, aiming to improve simple linear interpolation with dynamic weighting. We compare the two approaches with three baselines - without adaptation, merging data, and simple interpolation - on Switchboard (SWBD) and Wall Street Journal (WSJ). Experiments show that the mixer model generally performs better than baselines and finetuning. Compared with no adaptation, finetuning and the mixer approach obtain up to relative 11.5% and 14.1% WER reductions on SWBD, respectively. The mixer model also outperforms linear interpolation and merging data. On WSJ, the mixer approach achieves a new state-of-the-art WER result.
引用
收藏
页码:7934 / 7938
页数:5
相关论文
共 31 条
  • [1] [Anonymous], 2017, P EMNLP
  • [2] [Anonymous], 2016, NeurIPS
  • [3] Bakhtin Anton, 2019, ARXIV190603351
  • [4] Statistical language model adaptation: review and perspectives
    Bellegarda, JR
    [J]. SPEECH COMMUNICATION, 2004, 42 (01) : 93 - 108
  • [5] Chen X, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3511
  • [6] Das A, 2019, INT CONF ACOUST SPEE, P5681, DOI 10.1109/ICASSP.2019.8682403
  • [7] Devlin Jacob, 2019, NAACL HLT 1
  • [8] Erhan D, 2010, J MACH LEARN RES, V11, P625
  • [9] Unsupervised Adaptation of Recurrent Neural Network Language Models
    Gangireddy, Siva Reddy
    Swietojanski, Pawel
    Bell, Peter
    Renals, Steve
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2333 - 2337
  • [10] A bit of progress in language modeling
    Goodman, JT
    [J]. COMPUTER SPEECH AND LANGUAGE, 2001, 15 (04) : 403 - 434