SPEECH RECOGNITION WITH PREDICTION-ADAPTATION-CORRECTION RECURRENT NEURAL NETWORKS

被引:0
作者
Zhang, Yu [1 ]
Yu, Dong [2 ]
Seltzer, Michael L. [2 ]
Droppo, Jasha [2 ]
机构
[1] MIT, CSAIL, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Microsoft Res, One Microsoft Way, Redmond, WA USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
Deep Neural Network; DNN; Recurrent neural network; RNN; Prediction-Adaptation-Correction RNN; PAC-RNN;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose the prediction-adaptation-correction RNN (PAC-RNN), in which a correction DNN estimates the state posterior probability based on both the current frame and the prediction made on the past frames by a prediction DNN. The result from the main DNN is fed back to the prediction DNN to make better predictions for the future frames. In the PAC-RNN, we can consider that, given the new, current frame information, the main DNN makes a correction on the prediction made by the prediction DNN. Alternatively, it can be viewed as adapting the main DNN's behavior based on the prediction DNN's prediction. Experiments on the TIMIT phone recognition task indicate that the PAC-RNN outperforms DNN, RNN, and LSTM with 2.4%, 2.1%, and 1.9% absolute phone accuracy improvement, respectively. We found that incorporating the prediction objective and including the recurrent loop are both important to boost the performance of the PAC-RNN.
引用
收藏
页码:5004 / 5008
页数:5
相关论文
共 21 条
[1]  
Abdel-Hamid O., 2014, IEEE T AUDIO SPEECH
[2]  
[Anonymous], P INT C AC SPEECH SI
[3]  
[Anonymous], P INT C AC SPEECH SI
[4]  
[Anonymous], P ICASSP
[5]  
[Anonymous], P INT C AC SPEECH SI
[6]  
[Anonymous], 2011, INTERSPEECH
[7]  
[Anonymous], 2014, P ICASSP
[8]  
[Anonymous], 2014, TECH REP
[9]  
[Anonymous], NEURAL COMPUTATION
[10]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42