IMPROVING SPEECH RECOGNITION ERROR PREDICTION FOR MODERN AND OFF-THE-SHELF SPEECH RECOGNIZERS

被引:0
作者
Serai, Prashant [1 ]
Wang, Peidong [1 ]
Fosler-Lussier, Eric [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
基金
美国国家科学基金会;
关键词
Speech Recognition; Error Prediction; Low Resource; Sequence to Sequence Neural Networks; Simulated ASR Errors;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or even no audio data is available at train time. Previous work typically considered replicating behavior of GMM-HMM based systems, but the behavior of more modern posterior-based neural network acoustic models is not the same and requires adjustments to the error prediction model. In this work, we extend a prior phonetic confusion based model for predicting speech recognition errors in two ways: first, we introduce a sampling-based paradigm that better simulates the behavior of a posterior-based acoustic model. Second, we investigate replacing the confusion matrix with a sequence-to-sequence model in order to introduce context dependency into the prediction. We evaluate the error predictors in two ways: first by predicting the errors made by a Switchboard ASR system on unseen data (Fisher), and then using that same predictor to estimate the behavior of an unrelated cloud-based ASR system on a novel task. Sampling greatly improves predictive accuracy within a 100-guess paradigm, while the sequence model performs similarly to the confusion matrix.
引用
收藏
页码:7255 / 7259
页数:5
相关论文
共 20 条
[1]   Detection of confusable words in automatic speech recognition [J].
Anguita, J ;
Hernando, J ;
Peillon, S ;
Bramoullé, A .
IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (08) :585-588
[2]  
[Anonymous], 2014, Advances in neural information processing systems
[3]  
Bahdanau Dzmitry, 2015, 3 INT C LEARN REPR I
[4]  
Cantab Research, 2015, CANT TEDL LANG MOD L
[5]  
Cieri C., 2004, PROC LREC, P69
[6]   A framework for predicting speech recognition errors [J].
Fosler-Lussier, E ;
Amdal, I ;
Kuo, HKJ .
SPEECH COMMUNICATION, 2005, 46 (02) :153-170
[7]  
Jin L., 2017, P 12 WORKSH INN US N, P11
[8]  
Jyothi P, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P1049
[9]  
Jyothi Preethi, 2009, 10 ANN C INT SPEECH
[10]  
Kurata G, 2011, INT CONF ACOUST SPEE, P5576