Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling

被引:0
作者
Sak, Hasim [1 ]
Senior, Andrew [1 ]
Beaufays, Francoise [1 ]
机构
[1] Google, New York, NY USA
来源
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年
关键词
Long Short-Term Memory; LSTM; recurrent neural network; RNN; speech recognition; acoustic modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we explore LSTM RNN architectures for large scale acoustic modeling in speech recognition. We recently showed that LSTM RNNs are more effective than DNNs and conventional RNNs for acoustic modeling, considering moderately-sized models trained on a single machine. Here, we introduce the first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines. We show that a two-layer deep LSTM RNN where each LSTM layer has a linear recurrent projection layer can exceed state-of-the-art speech recognition performance. This architecture makes more effective use of model parameters than the others considered, converges quickly, and outperforms a deep feed forward neural network having an order of magnitude more parameters.
引用
收藏
页码:338 / 342
页数:5
相关论文
共 24 条
[1]  
[Anonymous], 1994, Connectionist Speech Recognition: A Hybrid Approach
[2]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[3]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[4]  
Dean J., 2012, Advances in Neural Information Processing Systems, P1223, DOI DOI 10.5555/2999134.2999271
[5]   From Speech to Letters - Using a Novel Neural Network Architecture for Grapheme Based ASR [J].
Eyben, Florian ;
Woellmer, Martin ;
Schuller, Bjoern ;
Graves, Alex .
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, :376-+
[6]   Learning to forget: Continual prediction with LSTM [J].
Gers, FA ;
Schmidhuber, J ;
Cummins, F .
NEURAL COMPUTATION, 2000, 12 (10) :2451-2471
[7]   Learning precise timing with LSTM recurrent networks [J].
Gers, FA ;
Schraudolph, NN ;
Schmidhuber, J .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) :115-143
[8]   LSTM recurrent networks learn simple context-free and context-sensitive languages [J].
Gers, FA ;
Schtmidhuber, J .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2001, 12 (06) :1333-1340
[9]   Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J].
Graves, A ;
Schmidhuber, J .
NEURAL NETWORKS, 2005, 18 (5-6) :602-610
[10]  
Graves A., 2013, P IEEE INT C AC SPEE, DOI 10.1109/ICASSP.2013.6638947