An SNR-incremental stochastic matching algorithm for noisy speech recognition

被引:8
作者
Huang, CS [1 ]
Wang, HC [1 ]
Lee, CH [1 ]
机构
[1] Philips Res E Asia, Taipei 100, Taiwan
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 08期
关键词
expectation-maximization (EM) algorithm; robust speech recognition; stochastic matching;
D O I
10.1109/89.966089
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, an signal-to-noise ratio (SNR)-incremental stochastic matching (SISM) algorithm is proposed for robust speech recognition in noisy environments. The SISM algorithm is an extension of Sankar and Lee's stochastic matching (SM) for dealing with the distortion due to additive noise. We address two issues concerning the original maximum likelihood-based SM techniques. One concern is that the initial condition of the expectation-maximization (EM) algorithm has to be set carefully if the mismatch between training and testing is large. The other is that the performance is often limited by the newly adapted model in noise compensation instead of reaching the higher level of accuracy often obtained in clean environments. Our proposed SISM algorithm attempts to improve the initial condition and to relax the performance bound. First, the SISM algorithm provides a good initial condition making use of a set of environment-matched models. The second is a recursive operation, i.e., the reference model in each recursion is changed along the direction of SNR increment in order to push the recognition performance to that obtained at higher SNR levels. Experimental results show that the SISM algorithm provides further improvement after the best environment-matched performance has been reached, and can therefore obtain an additional discriminative power through using the speech models with higher SNR instead of retraining process.
引用
收藏
页码:866 / 873
页数:8
相关论文
共 22 条
[1]  
Acero A., 1992, ACOUSTICAL ENV ROBUS
[2]   A general joint additive and convolutive bias compensation approach applied to noisy Lombard speech recognition [J].
Afify, M ;
Gong, YF ;
Haton, JP .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (06) :524-538
[3]  
[Anonymous], 1996, THESIS CARNEGIE MELL
[4]  
[Anonymous], 1994, NIST SPEECH QUAL ASS
[5]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]   CEPSTRAL PARAMETER COMPENSATION FOR HMM RECOGNITION IN NOISE [J].
GALES, MJF ;
YOUNG, SJ .
SPEECH COMMUNICATION, 1993, 12 (03) :231-239
[8]   SPEECH RECOGNITION IN NOISY ENVIRONMENTS - A SURVEY [J].
GONG, YF .
SPEECH COMMUNICATION, 1995, 16 (03) :261-291
[9]  
HUANG CS, 2000, P INT S CHIN SPOK LA, P231
[10]  
HUANG CS, 1998, P INT S CHIN SPOK LA, P216