Long Short-Term Memory Networks for Noise Robust Speech Recognition

被引：0

作者：

Woellmer, Martin ^{[1
]}

Sun, Yang ^{[1
]}

Eyben, Florian ^{[1
]}

Schuller, Bjoern ^{[1
]}

机构：

[1] Tech Univ Munich, Inst Human Machine Commun, D-80290 Munich, Germany

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年

关键词：

Long Short-Term Memory; Recurrent Neural Networks; Speech Recognition; Noise Robustness; Dynamic Bayesian Networks;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper we introduce a novel hybrid model architecture for speech recognition and investigate its noise robustness on the Aurora 2 database. Our model is composed of a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net exploiting long-range context information for phoneme prediction and a Dynamic Bayesian Network (DBN) for decoding. The DBN is able to learn pronunciation variants as well as typical phoneme confusions of the BLSTM predictor in order to compensate signal disturbances. Unlike conventional Hidden Markov Model (HMM) systems, the proposed architecture is not based on Gaussian mixture modeling. Even without any feature enhancement, our BLSTM-DBN system outperforms a baseline HMM recognizer by up to 18%.

引用

页码：2966 / 2969

页数：4

共 24 条

[1]

[Anonymous], 2007, INT C NEUR INF PROC

[2]

[Anonymous], ADV NIPS

[3]

[Anonymous], 2005, Neural Netw.

[4]

[Anonymous], 1996, An introduction to Bayesian networks

[5]

[Anonymous], P INTERSPEECH

[6] Graphical model architectures for speech recognition [J].

Bilmes, JA ;

Bartels, C .

IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) :89-100

[7] Histogram equalization of speech representation for robust speech recognition [J].

de la Torre, A ;

Peinado, AM ;

Segura, JC ;

Pérez-Córdoba, JL ;

Benítez, MC ;

Rubio, AJ .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :355-366

[8]

Droppo J., 2004, P ICASSP MONTR CAN

[9]

Fernández S, 2007, LECT NOTES COMPUT SC, V4669, P220

[10]

Hermansky H, 2000, INT CONF ACOUST SPEE, P1635, DOI 10.1109/ICASSP.2000.862024

← 1 2 3 →