Deep Long Short-Term Memory Networks for Speech Recognition

被引：0

作者：

Chien, Jen-Tzung ^{[1
]}

Misbullah, Alim ^{[1
]}

机构：

[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan

来源：

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年

关键词：

speech recognition; acoustic modeling; hybrid neural network; long short-term memory;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Speech recognition has been significantly improved by applying acoustic models based on deep neural network which could be realized as the feedforward NN (FNN) or the recurrent NN (RNN). In general, FNN is feasible to project the observations onto a deep invariant feature space while RNN is beneficial to capture the temporal information in a sequential data for speech recognition. RNN based on long short-term memory (LSTM) is capable of storing inputs over a long time period and thus exploiting a self-learned mechanism for long-range temporal context. Considering the complimentary FNN and RNN in their modeling capabilities, this paper presents a deep model which is constructed by stacking LSTM and FNN. Through the cascade of LSTM cells and fully-connected feedforward units, we explore the temporal patterns and summarize the long history of previous inputs in a deep learning machine. The experiments on 3rd CHiME challenge and Aurora-4 show that the stacks of hybrid model with FNN post-processor outperform stand-alone FNN and LSTM and the other hybrid models for robust speech recognition.

引用

页数：5

共 28 条

[1]

[Anonymous], 2013, P IEEE INT C AC SPEE, DOI 10.1109/ICASSP.2013.6638947

[2]

[Anonymous], ARXIV12115063

[3]

[Anonymous], 2011, P INT C FLOR IT 27 3

[4]

Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837

[5]

Chao Weng, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5532, DOI 10.1109/ICASSP.2014.6854661

[6] Training Deep Bidirectional LSTM Acoustic Model for LVCSR by a Context-Sensitive-Chunk BPTT Approach [J].

Chen, Kai ;

Huo, Qiang .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) :1185-1193

[7]

Chien JT, 2014, IEEE W SP LANG TECH, P206, DOI 10.1109/SLT.2014.7078575

[8]

Chien JT, 2014, IEEE W SP LANG TECH, P147, DOI 10.1109/SLT.2014.7078565

[9]

Chien JT, 2015, INT CONF ACOUST SPEE, P4560, DOI 10.1109/ICASSP.2015.7178834

[10] Bayesian Recurrent Neural Network for Language Modeling [J].

Chien, Jen-Tzung ;

Ku, Yuan-Chu .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (02) :361-374

← 1 2 3 →