Directed Automatic Speech Transcription Error Correction Using Bidirectional LSTM

被引:0
作者
Zheng, Da [1 ]
Chen, Zhehuai [1 ]
Wu, Yue [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Brain Sci & Technol Res Ctr, Key Lab Shanghai Educ Commiss Intelligent Interac, SpeechLab,Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年
关键词
speech transcription; speech recognition; error correction; human-computer interaction;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In automatic speech recognition (ASR), error correction after the initial search stage is a commonly used technique to improve performance. Whilst completely automatic error correction, such as full second pass rescoring using complex language models, is widely used, directed error correction, where the error locations are manually given, is of great interest in many scenarios. Previous works on directed error correction usually uses the error location information to change search space with original ASR models. In this paper, a novel deep learning based score combination approach is proposed for directed error correction. Here, a bi-directional LSTM (BLSTM) language model is trained to estimate unnormalized sentence completion scores. These completion scores are then combined with the confusion network scores from the initial search stage for hypothesis rescoring. Experiments showed that the BLSTM based language model achieved better results not only than simpler models such as bi-directional n-gram or LSTM, but also better than human prediction. In a real world Chinese ASR task, it was also shown that the proposed approach significantly outperformed the approach of choosing the second best hypothesis in the error sausages of confusion networks.
引用
收藏
页数:5
相关论文
共 23 条
[1]  
Aleksic R, 2015, P INTERSPEECH
[2]  
Amodei Dario., 2015, Deep speech 2: End-to-end speech recognition in english and mandarin, DOI DOI 10.1145/1143844.1143891.1512.02595
[3]  
Ananthakrishnan S, 2007, INT CONF ACOUST SPEE, P873
[4]  
Berglund M, 2015, ADV NEURAL INFORM PR, V1, P856, DOI DOI 10.1021/ACS.JCIM.9B00943
[5]  
Evermann Gunnar, 2000, P NIST SPEECH TRANSC, V27, P78
[6]   Learning precise timing with LSTM recurrent networks [J].
Gers, FA ;
Schraudolph, NN ;
Schmidhuber, J .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) :115-143
[7]   Minimum Bayes-risk automatic speech recognition [J].
Goel, V ;
Byrne, WJ .
COMPUTER SPEECH AND LANGUAGE, 2000, 14 (02) :115-135
[8]  
Graves A, 2004, LECT NOTES COMPUT SC, V3141, P127
[9]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[10]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]