Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs

被引:114
作者
Peddinti, Vijayaditya [1 ,2 ]
Wang, Yiming [1 ]
Povey, Daniel [1 ,2 ]
Khudanpur, Sanjeev [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
基金
美国国家科学基金会;
关键词
Acoustic model; long short-term memory (LSTM); recurrent neural networks; time-delay neural networks; NEURAL-NETWORKS; RECOGNITION;
D O I
10.1109/LSP.2017.2723507
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Bidirectional long short-term memory (BLSTM) acoustic models provide a significant word error rate reduction compared to their unidirectional counterpart, as they model both the past and future temporal contexts. However, it is nontrivial to deploy bidirectional acoustic models for online speech recognition due to an increase in latency. In this letter, we propose the use of temporal convolution, in the form of time-delay neural network (TDNN) layers, along with unidirectional LSTM layers to limit the latency to 200 ms. This architecture has been shown to outperform the state-of-the-art low frame rate (LFR) BLSTM models. We further improve these LFR BLSTM acoustic models by operating them at higher frame rates at lower layers and show that the proposed model performs similar to these mixed frame rate BLSTMs. We present results on the Switchboard 300 h LVCSR task and the AMI LVCSR task, in the three microphone conditions.
引用
收藏
页码:373 / 377
页数:5
相关论文
共 29 条
[1]  
Amodei D, 2016, PR MACH LEARN RES, V48
[2]  
[Anonymous], 2017, COMP TDNN LSTMS CNN
[3]  
[Anonymous], 2017, CODE REPRODUCE RESUL
[4]  
Bourlard H.A., 1994, Connectionist speech recognition: a hybrid approach, V247
[5]  
Carletta J, 2005, LECT NOTES COMPUT SC, V3869, P28
[6]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[7]  
Chen K, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3600
[8]  
Cheng G., P INTERSPEE IN PRESS
[9]   Fast and robust training of recurrent neural networks for offline handwriting recognition [J].
Doetseh, Patrick ;
Kozielski, Michal ;
Ney, Hermann .
2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, :279-284
[10]  
Graves A, 2005, LECT NOTES COMPUT SC, V3697, P799