Using Spectro-Temporal Features to Improve AFE Feature Extraction for ASR

被引：0

作者：

Ravuri, Suman V. ^{[1
]}

Morgan, Nelson ^{[1
]}

机构：

[1] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

automatic speech recognition; spectro-temporal features; SPEECH;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Previous work has shown that spectro-temporal features reduce WER for automatic speech recognition under noisy conditions. The spectro-temporal framework, however, is not the only way to process features in order to reduce errors due to noise in the signal. The two-stage mel-warped Wiener filtering method used in the "Advanced Front End" (AFE), now a standard front end for robust recognition, is another way. Since the spectro-temporal approach can be applied to a noise-reduced spectrum, we wanted to explore whether spectro-temporal features could improve the performance of AFE for ASR. We show that computing spectro-temporal features after AFE processing results in a 45% relative improvement compared to AFE in clean conditions and a 6% to 30% improvement in noisy conditions on the Aurora2 clean training setup.

引用

页码：1181 / 1184

页数：4

共 19 条

[1]

Agarwal A., 1999, Proc. ASRU, V99, P67, DOI 10.1.1.34.1207

[2]

[Anonymous], 2003, 8 EUROPEAN C SPEECH

[3]

[Anonymous], 2002, ETSI ES

[4]

Bourlard H., 1996, P INT C SPOK LANG PR, P422

[5] Spectro-temporal modulation transfer functions and speech intelligibility [J].

Chi, TS ;

Gao, YJ ;

Guyton, MC ;

Ru, PW ;

Shamma, S .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (05) :2719-2732

[6]

Cole R., 1994, ICSLP 94. 1994 International Conference on Spoken Language Processing, P1815

[7] Hierarchical spectro-temporal features for robust speech recognition [J].

Domont, Xavier ;

Heckmann, Martin ;

Joublin, Frank ;

Goerick, Christian .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :4417-4420

[8]

Gelbart D., NOISY NUMBERS DATA N

[9]

Hermansky H, 2000, INT CONF ACOUST SPEE, P1635, DOI 10.1109/ICASSP.2000.862024

[10]

Hermansky H., 2005, Proci of Inter speech 2005, P361

← 1 2 →