HYBRID CONTEXT DEPENDENT CD-DNN-HMM KEYWORD SPOTTING (KWS) IN SPEECH CONVERSATIONS

被引：0

作者：

Tyagi, Vivek ^{[1
]}

机构：

[1] TCS Res Delhi, Delhi, India

来源：

2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2016年

关键词：

Speech Recognition; Deep Neural Networks; Keyword Spotting;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We present detailed analysis of phoneme recognition performance of a context dependent tied-state triphone Gaussian Mixture Model Hidden Markov Model (CD-GMM-HMM) acoustic model (state-of-the-art large acoustic model (AM)) and a four hidden layer context dependent Deep Neural Network (CD-DNN-HMM) AM on the WSJ speech corpus. Using a bigram phoneme language model, phoneme recognition experiments are performed on a two hour independent test set using the Viterbi decoding which show a relative 33:3% improvement by our CD-DNN acoustic model. We then present a filler based Hybrid DNN-HMM Keyword Spotting KWS system which to our knowledge is the first KWS architecture using context dependent DNN and HMM. In our experiments, a strong baseline of CD-GMM-HMM KWS provide 79:0% correct detection accuracy at a false alarm (FA) rate of 5:0 FA/Hr. Whereas, the proposed hybrid CD-DNN-HMM KWS results in 88:5% correct detection accuracy at 5:0 FA/Hr - a relative improvement of 43:3%. We provide further analysis and conclude that Hybrid CD-DNN-HMM KWS provides an attractive alternate solution for near real-time KWS applications with high detection accuracy and low FA.

引用

页数：6

共 14 条

[1]

[Anonymous], 2012, ARXIV12115590

[2]

Bengio Yoshua, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P437, DOI 10.1007/978-3-642-35289-8_26

[3]

Chen GG, 2015, INT CONF ACOUST SPEE, P5236, DOI 10.1109/ICASSP.2015.7178970

[4]

Chen Stanley, 1998, Evaluation metrics for language models

[5]

Clements M., 2001, P 2001 C NAT ASS BRO

[6] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].

Dahl, George E. ;

Yu, Dong ;

Deng, Li ;

Acero, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42

[7]

Glorot X., 2011, 14 INT C ART INT STA, P315, DOI DOI 10.1177/1753193410395357

[8]

Guoguo Chen, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4087, DOI 10.1109/ICASSP.2014.6854370

[9] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

[10]

Knill K.M., 1994, SPEAKER DEPENDENT KE

← 1 2 →