A General Procedure for Improving Language Models in Low-Resource Speech Recognition

被引:0
作者
Liu, Qian [1 ]
Zhang, Wei-Qiang [1 ]
Liu, Jia [1 ]
Liu, Yao [2 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China
[2] China Gen Technol Res Inst, Beijing 100084, Peoples R China
来源
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP) | 2019年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
language modeling; speech recognition; low-resource languages; data augmentation;
D O I
10.1109/ialp48816.2019.9037726
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is difficult for a language model (LM) to perform well with limited in-domain transcripts in low-resource speech recognition. In this paper, we mainly summarize and extend some effective methods to make the most of the out-of-domain data to improve LMs. These methods include data selection, vocabulary expansion, lexicon augmentation, multi-model fusion and so on. The methods are integrated into a systematic procedure, which proves to be effective for improving both n-gram and neural network LMs. Additionally, pre-trained word vectors using out-of-domain data arc utilized to improve the performance of RNN/LSTM LMs for restoring first-pass decoding results. Experiments on live Asian languages from Babel Build Packs show that, after improving LMs, 5.4-7.6% relative reduction of word error rate (WER) is generally achieved compared to the baseline ASR systems. For some languages, we achieve lower WER than newly published results on the same data sets.
引用
收藏
页码:428 / 433
页数:6
相关论文
共 31 条
[1]  
Alumäe T, 2017, INT CONF ACOUST SPEE, P5755, DOI 10.1109/ICASSP.2017.7953259
[2]  
Alumäe T, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1820
[3]  
[Anonymous], 2011, PROC 2011 WORKSHOP A
[4]   Joint-sequence models for grapheme-to-phoneme conversion [J].
Bisani, Maximilian ;
Ney, Hermann .
SPEECH COMMUNICATION, 2008, 50 (05) :434-451
[5]  
Chen S., 1998, PROC BROADC NEWS TRA
[6]   An empirical study of smoothing techniques for language modeling [J].
Chen, SF ;
Goodman, J .
COMPUTER SPEECH AND LANGUAGE, 1999, 13 (04) :359-394
[7]  
Fraga-Silva T, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P153, DOI 10.1109/ASRU.2015.7404788
[8]  
Goldhahn D, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P759
[9]   Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions [J].
Gorin, Arseniy ;
Lileikyte, Rasa ;
Huang, Guangpu ;
Lamel, Lori ;
Gauvain, Jean-Luc ;
Laurent, Antoine .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :775-779
[10]  
Huang GP, 2017, INT CONF ACOUST SPEE, P5790, DOI 10.1109/ICASSP.2017.7953266