Dynamic Features in the Linear Domain for Robust Automatic Speech Recognition in a Reverberant Environment

被引：0

作者：

Ichikawa, Osamu ^{[1
]}

Fukuda, Takashi ^{[1
]}

Tachibana, Ryuki ^{[1
]}

Nishimura, Masafumi ^{[1
]}

机构：

[1] IBM Res, Tokyo Res Lab, Tokyo, Japan

来源：

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年

关键词：

automatic speech recognition; dynamic feature; reverberation; linear delta; delta; MFCC;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since the MFCC are calculated from logarithmic spectra, the delta and delta-delta are considered as difference operations in a logarithmic domain. In a reverberant environment, speech signals have trailing reverberations, whose power is plotted as a long-term exponential decay. This means the logarithmic delta value tends to remain large for a long time. This paper proposes a delta feature calculated in the linear domain, due to the rapid decay in reverberant environments. In an experiment using an evaluation framework (CENSREC-4), significant improvements were found in reverberant situations by simply replacing the MFCC dynamic features with the proposed dynamic features.

引用

页码：44 / 47

页数：4

共 17 条

[1]

BABA A, 2002, P ASJ, P27

[2]

COUVREUR L, 2000, P INT C SPOK LANG PR, V1, P397

[3]

DELCROIX M, 2006, P ICASSP 06, V1, P825

[4] SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM [J].

FURUI, S .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01) :52-59

[5]

Gales M.J. F., 1997, MAXIMUM LIKELIHOOD L

[6] Robust continuous speech recognition using parallel model combination [J].

Gales, MJF ;

Young, SJ .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (05) :352-359

[7] Training of HMM with filtered speech material for hands-free recognition [J].

Giuliani, D ;

Matassoni, M ;

Omologo, M ;

Svaizer, P .

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :449-452

[8]

Kinoshita K, 2007, INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, P1085

[9] MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS [J].

LEGGETTER, CJ ;

WOODLAND, PC .

COMPUTER SPEECH AND LANGUAGE, 1995, 9 (02) :171-185

[10]

LU X, 2006, P ICSLP 06, P2546

← 1 2 →