Using Spectro-Temporal Features to Improve AFE Feature Extraction for ASR

被引:0
作者
Ravuri, Suman V. [1 ]
Morgan, Nelson [1 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
automatic speech recognition; spectro-temporal features; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous work has shown that spectro-temporal features reduce WER for automatic speech recognition under noisy conditions. The spectro-temporal framework, however, is not the only way to process features in order to reduce errors due to noise in the signal. The two-stage mel-warped Wiener filtering method used in the "Advanced Front End" (AFE), now a standard front end for robust recognition, is another way. Since the spectro-temporal approach can be applied to a noise-reduced spectrum, we wanted to explore whether spectro-temporal features could improve the performance of AFE for ASR. We show that computing spectro-temporal features after AFE processing results in a 45% relative improvement compared to AFE in clean conditions and a 6% to 30% improvement in noisy conditions on the Aurora2 clean training setup.
引用
收藏
页码:1181 / 1184
页数:4
相关论文
共 19 条
[1]  
Agarwal A., 1999, Proc. ASRU, V99, P67, DOI 10.1.1.34.1207
[2]  
[Anonymous], 2003, 8 EUROPEAN C SPEECH
[3]  
[Anonymous], 2002, ETSI ES
[4]  
Bourlard H., 1996, P INT C SPOK LANG PR, P422
[5]   Spectro-temporal modulation transfer functions and speech intelligibility [J].
Chi, TS ;
Gao, YJ ;
Guyton, MC ;
Ru, PW ;
Shamma, S .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (05) :2719-2732
[6]  
Cole R., 1994, ICSLP 94. 1994 International Conference on Spoken Language Processing, P1815
[7]   Hierarchical spectro-temporal features for robust speech recognition [J].
Domont, Xavier ;
Heckmann, Martin ;
Joublin, Frank ;
Goerick, Christian .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :4417-4420
[8]  
Gelbart D., NOISY NUMBERS DATA N
[9]  
Hermansky H, 2000, INT CONF ACOUST SPEE, P1635, DOI 10.1109/ICASSP.2000.862024
[10]  
Hermansky H., 2005, Proci of Inter speech 2005, P361