FUSION OF MULTIPLE UNCERTAINTY ESTIMATORS AND PROPAGATORS FOR NOISE ROBUST ASR

被引:0
作者
Tran, Dung T. [1 ]
Vincent, Emmanuel [1 ]
Jouvet, Denis [1 ]
机构
[1] Inria, F-54600 Villers Les Nancy, France
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
Noise robust ASR; uncertainty handling; SPEECH RECOGNITION; ENHANCEMENT; COMPENSATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Uncertainty decoding has been successfully used for speech recognition in highly nonstationary noise environments. Yet, accurate estimation of the uncertainty on the denoised signals and propagation to the features remain difficult. In this work, we propose to fuse the uncertainty estimates obtained from different uncertainty estimators and propagators by linear combination. The fusion coefficients are optimized by minimizing a measure of divergence with oracle estimates on development data. Using the Kullback-Leibler divergence, we obtain 18% relative error rate reduction on the 2nd CHiME Challenge with respect to conventional decoding, that is about twice as much as the reduction achieved by the best single uncertainty estimator and propagator.
引用
收藏
页数:5
相关论文
共 21 条
[1]  
[Anonymous], THESIS TU BERLIN
[2]  
[Anonymous], THESIS CAMBRIDGE U
[3]  
[Anonymous], 2000, P ANN C INT SPEECH C
[4]   Integration of beamforming and uncertainty-of-observation techniques for robust ASR in multi-source environments [J].
Astudillo, Ramon Fernandez ;
Kolossa, Dorothea ;
Abad, Alberto ;
Zeiler, Steffen ;
Saeidi, Rahim ;
Mowlaee, Pejman ;
da Silva Neto, Joao Paulo ;
Martin, Rainer .
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03) :837-850
[5]   Robust automatic speech recognition with missing and unreliable acoustic data [J].
Cooke, M ;
Green, P ;
Josifovski, L ;
Vizinho, A .
SPEECH COMMUNICATION, 2001, 34 (03) :267-285
[6]   Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds [J].
Delcroix, Marc ;
Kinoshita, Keisuke ;
Nakatani, Tomohiro ;
Araki, Shoko ;
Ogawa, Atsunori ;
Hori, Takaaki ;
Watanabe, Shinji ;
Fujimoto, Masakiyo ;
Yoshioka, Takuya ;
Oba, Takanobu ;
Kubo, Yotaro ;
Souden, Mehrez ;
Hahm, Seong-Jun ;
Nakamura, Atsushi .
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03) :851-873
[7]   Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing [J].
Delcroix, Marc ;
Nakatani, Tomohiro ;
Watanabe, Shinji .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02) :324-334
[8]   Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion [J].
Deng, L ;
Droppo, J ;
Acero, A .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :412-421
[9]  
Deng L, 2011, ROBUST SPEECH RECOGNITION OF UNCERTAIN OR MISSING DATA: THEORY AND APPLICATIONS, P67, DOI 10.1007/978-3-642-21317-5_4
[10]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121