Normalizing the speech modulation spectrum for robust speech recognition

被引：0

作者：

Xiao, Xiong ^{[1
,2
]}

Chng, Eng Siong ^{[1
]}

Li, Haizhou ^{[1
,2
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore

[2] Inst Infocomm Res, Singapore, Singapore

来源：

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年

关键词：

speech recognition; feature normalization; modulation spectrum; square-root Wiener filter; temporal filter;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel feature normalization technique for robust speech recognition. The proposed technique normalizes the temporal structure of the feature to reduce the feature variation due to environmental interferences. Specifically, it normalizes the utterance-dependent feature modulation spectrum to a reference function by filtering the feature using a square-root Wiener filter in the temporal domain. We show experimentally that the proposed technique when combined with mean and variance normalization technique (MVN) reduces the word error rate significantly on the AURORA-2 task, with relative error rate reduction 69.11% compared to the base me.

引用

页码：1021 / +

页数：2

共 10 条

[1]

[Anonymous], P ICSLP

[2] Histogram equalization of speech representation for robust speech recognition [J].

de la Torre, A ;

Peinado, AM ;

Segura, JC ;

Pérez-Córdoba, JL ;

Benítez, MC ;

Rubio, AJ .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :355-366

[3] CEPSTRAL ANALYSIS TECHNIQUE FOR AUTOMATIC SPEAKER VERIFICATION [J].

FURUI, S .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (02) :254-272

[4] RASTA Processing of Speech [J].

Hermansky, Hynek ;

Morgan, Nelson .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589

[5]

Hirsch H.-G., 2000, P ICSLP, P29

[6] Optimization of temporal filters for constructing robust features in speech recognition [J].

Hung, JW ;

Lee, LS .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03) :808-832

[7] On the relative importance of various components of the modulation spectrum for automatic speech recognition [J].

Kanedera, N ;

Arai, T ;

Hermansky, H ;

Pavel, M .

SPEECH COMMUNICATION, 1999, 28 (01) :43-55

[8]

Proakis JG., 1996, Digital signal processing, V3

[9]

Vaseghi S.V., 2000, ADV DIGITAL SIGNAL P, V2nd

[10]

VIIKKI O, 1998, P IEEE INT C AC SPEE, V11, P733

← 1 →