Switching linear dynamical systems for noise robust speech recognition

被引:44
作者
Mesot, Bertrand [1 ]
Barber, David [1 ]
机构
[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 06期
关键词
approximate inference; expectation correction; isolated digit recognition; linear dynamical system; noise robustness; switching autoregressive process;
D O I
10.1109/TASL.2007.901312
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Real world applications such as hands-free dialling in cars may have to deal with potentially very noisy environments. Existing state-of-the-art solutions to this problem use feature-based HMMs, with a preprocessing stage to clean the noisy signal. However, the effect that raw signal noise has on the induced HMM features is poorly understood, and limits the performance of the HMM system. An alternative to feature-based HMMs is to model the raw signal, which has the potential advantage that including an explicit noise model is straightforward. Here we jointly model the dynamics of both the raw speech signal and the noise, using a Switching Linear Dynamical System (SLDS). The new model was tested on isolated digit utterances corrupted by Gaussian noise. Contrary to the Autoregressive HMM and its derivatives, which provides a model of uncorrupted raw speech, the SLDS is comparatively noise robust and also significantly outperforms a state-of-the-art feature-based HMM. The computational complexity of the SLDS scales exponentially with the length of the time series. To counter this we use Expectation Correction which provides a stable and accurate linear-time approximation for this important class of models, aiding their further application in acoustic modeling.
引用
收藏
页码:1850 / 1858
页数:9
相关论文
共 26 条
[1]   NONLINEAR BAYESIAN ESTIMATION USING GAUSSIAN SUM APPROXIMATIONS [J].
ALSPACH, DL ;
SORENSON, HW .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1972, AC17 (04) :439-&
[2]  
[Anonymous], 1998, 9810 COMP CAMBR RES
[3]  
Bar-Shalom Y., 1998, ESTIMATION TRACKING
[4]  
BARBER D, 2006, P NIPS, V20
[5]  
Barber D, 2006, J MACH LEARN RES, V7, P2515
[6]   STATISTICAL INFERENCE FOR PROBABILISTIC FUNCTIONS OF FINITE STATE MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (06) :1554-&
[7]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[8]   A generative model for music transcription [J].
Cemgil, AT ;
Kappen, HJ ;
Barber, D .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :679-694
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
DROPPO J, 2004, P INT C AC SPEECH SI, V1