On the relative importance of various components of the modulation spectrum for automatic speech recognition

被引:87
作者
Kanedera, N [1 ]
Arai, T
Hermansky, H
Pavel, M
机构
[1] Ishikawa Natl Coll Technol, Tsubata, Ishikawa 92903, Japan
[2] Sophia Univ, Tokyo 102, Japan
[3] Oregon Grad Inst Sci & Technol, Portland, OR USA
[4] Int Comp Sci Inst, Berkeley, CA 94704 USA
[5] AT&T Labs W, Menlo Pk, CA USA
关键词
modulation frequency; modulation spectrum; automatic speech recognition;
D O I
10.1016/S0167-6393(99)00002-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We measured the accuracy of speech recognition as a function of band-pass filtering of the time trajectories of spectral envelopes. We examined (i) several types of recognizers such as dynamic time warping (DTW) and hidden Markov model (HMM), and (ii) several types of features, such as filter bank output, mel-frequency cepstral coefficients (MFCC), and perceptual linear predictive (PLP) coefficients. We used the resulting recognition data to determine the relative importance of information in different modulation spectral components of speech for automatic speech recognition. We concluded that: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz; (2) in some realistic environments, the use of components from the range below 2 Hz or above 16 Hz can degrade the recognition accuracy. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:43 / 55
页数:13
相关论文
共 12 条
[1]  
Arai T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P2490, DOI 10.1109/ICSLP.1996.607318
[2]   EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312
[3]   EFFECT OF REDUCING SLOW TEMPORAL MODULATIONS ON SPEECH RECEPTION [J].
DRULLMAN, R ;
FESTEN, JM ;
PLOMP, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05) :2670-2680
[4]   EFFECT OF TEMPORAL ENVELOPE SMEARING ON SPEECH RECEPTION [J].
DRULLMAN, R ;
FESTEN, JM ;
PLOMP, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (02) :1053-1064
[5]   SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM [J].
FURUI, S .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01) :52-59
[6]  
Greenberg S., 1996, P ESCA WORKSH AUD BA, P1
[7]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[8]  
HERMANSKY H, 1993, P INT C AC SPEECH SI
[9]   RASTA Processing of Speech [J].
Hermansky, Hynek ;
Morgan, Nelson .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589
[10]   A REVIEW OF THE MTF CONCEPT IN ROOM ACOUSTICS AND ITS USE FOR ESTIMATING SPEECH-INTELLIGIBILITY IN AUDITORIA [J].
HOUTGAST, T ;
STEENEKEN, HJM .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1985, 77 (03) :1069-1077