HYBRID CONTEXT DEPENDENT CD-DNN-HMM KEYWORD SPOTTING (KWS) IN SPEECH CONVERSATIONS

被引:0
作者
Tyagi, Vivek [1 ]
机构
[1] TCS Res Delhi, Delhi, India
来源
2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2016年
关键词
Speech Recognition; Deep Neural Networks; Keyword Spotting;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present detailed analysis of phoneme recognition performance of a context dependent tied-state triphone Gaussian Mixture Model Hidden Markov Model (CD-GMM-HMM) acoustic model (state-of-the-art large acoustic model (AM)) and a four hidden layer context dependent Deep Neural Network (CD-DNN-HMM) AM on the WSJ speech corpus. Using a bigram phoneme language model, phoneme recognition experiments are performed on a two hour independent test set using the Viterbi decoding which show a relative 33:3% improvement by our CD-DNN acoustic model. We then present a filler based Hybrid DNN-HMM Keyword Spotting KWS system which to our knowledge is the first KWS architecture using context dependent DNN and HMM. In our experiments, a strong baseline of CD-GMM-HMM KWS provide 79:0% correct detection accuracy at a false alarm (FA) rate of 5:0 FA/Hr. Whereas, the proposed hybrid CD-DNN-HMM KWS results in 88:5% correct detection accuracy at 5:0 FA/Hr - a relative improvement of 43:3%. We provide further analysis and conclude that Hybrid CD-DNN-HMM KWS provides an attractive alternate solution for near real-time KWS applications with high detection accuracy and low FA.
引用
收藏
页数:6
相关论文
共 14 条
[1]  
[Anonymous], 2012, ARXIV12115590
[2]  
Bengio Yoshua, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P437, DOI 10.1007/978-3-642-35289-8_26
[3]  
Chen GG, 2015, INT CONF ACOUST SPEE, P5236, DOI 10.1109/ICASSP.2015.7178970
[4]  
Chen Stanley, 1998, Evaluation metrics for language models
[5]  
Clements M., 2001, P 2001 C NAT ASS BRO
[6]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[7]  
Glorot X., 2011, 14 INT C ART INT STA, P315, DOI DOI 10.1177/1753193410395357
[8]  
Guoguo Chen, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4087, DOI 10.1109/ICASSP.2014.6854370
[9]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[10]  
Knill K.M., 1994, SPEAKER DEPENDENT KE