Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method

被引:33
作者
Dharanipragada, Satya [1 ]
Yapanel, Umit H.
Rao, Bhaskar D.
机构
[1] Citadel Investment Grp, Chicago, IL 60603 USA
[2] Univ Colorado, Boulder, CO 80302 USA
[3] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 01期
基金
美国国家科学基金会;
关键词
distortionless response; minimum variance; robust feature extraction for continuous speech recognition; spectral analysis; speech analysis;
D O I
10.1109/TASL.2006.876776
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the minimum variance distortionless response (MVDR) method of spectrum estimation. We consider incorporating perceptual information in two ways: 1) after the MVDR power spectrum is computed and 2) directly during the MVDR spectrum estimation. We show that incorporating perceptual information directly into the spectrum estimation improves both robustness and computational efficiency significantly. We analyze the class separability and speaker variability properties of the features using a Fisher linear discriminant measure and show that these features provide better class separability and better suppression of speaker-dependent information than the widely used mel frequency cepstral coefficient (MFCC) features. We evaluate the technique on four different tasks: an in-car speech recognition task, the Aurora-2 matched task, the Wall Street Journal (WSJ) task, and the Switchboard task. The new feature extraction technique gives lower word-error-rates than the MFCC and perceptual linear prediction (PLP) feature extraction techniques in most cases. Statistical significance tests reveal that the improvement is most significant in high noise conditions. The technique thus provides improved robustness to noise without sacrificing performance in clean conditions.
引用
收藏
页码:224 / 234
页数:11
相关论文
共 35 条
[1]  
[Anonymous], P INTERSPEECH
[2]  
[Anonymous], 2003, PROCEEDINGS OF IEEE
[3]  
BAHL LR, 1995, P ICASSP, V1, P41
[4]   HIGH-RESOLUTION FREQUENCY-WAVENUMBER SPECTRUM ANALYSIS [J].
CAPON, J .
PROCEEDINGS OF THE IEEE, 1969, 57 (08) :1408-&
[5]  
Chen SS, 1998, INT CONF ACOUST SPEE, P645, DOI 10.1109/ICASSP.1998.675347
[6]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[7]  
DHARANIPRAGADA S, 2001, P IEEE INT C AC SPEE, V1, P309
[8]  
DHARANIPRAGADA S, 1998, ICSLP, V3, P967
[9]  
DUDA R, 1993, PATTERN CLASSIFICATI
[10]   VITERBI ALGORITHM [J].
FORNEY, GD .
PROCEEDINGS OF THE IEEE, 1973, 61 (03) :268-278