Dynamic Features in the Linear Domain for Robust Automatic Speech Recognition in a Reverberant Environment

被引:0
作者
Ichikawa, Osamu [1 ]
Fukuda, Takashi [1 ]
Tachibana, Ryuki [1 ]
Nishimura, Masafumi [1 ]
机构
[1] IBM Res, Tokyo Res Lab, Tokyo, Japan
来源
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年
关键词
automatic speech recognition; dynamic feature; reverberation; linear delta; delta; MFCC;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the MFCC are calculated from logarithmic spectra, the delta and delta-delta are considered as difference operations in a logarithmic domain. In a reverberant environment, speech signals have trailing reverberations, whose power is plotted as a long-term exponential decay. This means the logarithmic delta value tends to remain large for a long time. This paper proposes a delta feature calculated in the linear domain, due to the rapid decay in reverberant environments. In an experiment using an evaluation framework (CENSREC-4), significant improvements were found in reverberant situations by simply replacing the MFCC dynamic features with the proposed dynamic features.
引用
收藏
页码:44 / 47
页数:4
相关论文
共 17 条
[1]  
BABA A, 2002, P ASJ, P27
[2]  
COUVREUR L, 2000, P INT C SPOK LANG PR, V1, P397
[3]  
DELCROIX M, 2006, P ICASSP 06, V1, P825
[4]   SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM [J].
FURUI, S .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01) :52-59
[5]  
Gales M.J. F., 1997, MAXIMUM LIKELIHOOD L
[6]   Robust continuous speech recognition using parallel model combination [J].
Gales, MJF ;
Young, SJ .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (05) :352-359
[7]   Training of HMM with filtered speech material for hands-free recognition [J].
Giuliani, D ;
Matassoni, M ;
Omologo, M ;
Svaizer, P .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :449-452
[8]  
Kinoshita K, 2007, INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, P1085
[9]   MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS [J].
LEGGETTER, CJ ;
WOODLAND, PC .
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (02) :171-185
[10]  
LU X, 2006, P ICSLP 06, P2546