Robust AM-FM features for speech recognition

被引:71
作者
Dimitriadis, D [1 ]
Maragos, P
Potamianos, A
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, GR-15773 Athens, Greece
[2] Tech Univ Crete, Dept Elect & Comp Engn, Khania 73100, Crete, Greece
关键词
AM-FM; ASR; features; nonlinear; speech;
D O I
10.1109/LSP.2005.853050
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, a nonlinear AM-FM speech model is used to extract robust features for speech recognition.. The proposed features measure the amount of amplitude and frequency modulation that exists in speech resonances and attempt to model aspects of the speech acoustic information that the commonly used linear source-filter model fails to capture. The robustness and discriminability of the AM-FM features is investigated in combination with mel cepstrum coefficients (MFCCs). It is shown that these hybrid features perform well in the presence of noise, both in terms of phoneme-discrimination (J-measure) and in terms of speech recognition performance in several different tasks. Average relative error rate reduction up to 11% for clean and 46% for mismatched noisy conditions is achieved when AM-FM features are combined with MFCCs.
引用
收藏
页码:621 / 624
页数:4
相关论文
共 12 条
[1]  
DIMITRIADIS D, 2003, P EUR GEN SWITZ SEP, P2853
[2]  
EZZAIDI H, 2000, P ICSLP, V2, P318
[3]  
Hart, 2006, PATTERN CLASSIFICATI
[4]   Teager energy based feature parameters for speech recognition in car noise [J].
Jabloun, F ;
Çetin, AE ;
Erzin, E .
IEEE SIGNAL PROCESSING LETTERS, 1999, 6 (10) :259-261
[5]   ENERGY SEPARATION IN SIGNAL MODULATIONS WITH APPLICATION TO SPEECH ANALYSIS [J].
MARAGOS, P ;
KAISER, JF ;
QUATIERI, TF .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (10) :3024-3051
[6]  
Paliwal KK, 1998, INT CONF ACOUST SPEE, P617, DOI 10.1109/ICASSP.1998.675340
[7]   Time-frequency analysis and auditory modeling for automatic recognition of speech [J].
Pitton, JW ;
Wang, KS ;
Juang, BH .
PROCEEDINGS OF THE IEEE, 1996, 84 (09) :1199-1215
[8]   Time-frequency distributions for automatic speech recognition [J].
Potamianos, A ;
Maragos, P .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03) :196-200
[9]   Speech formant frequency and bandwidth tracking using multiband energy demodulation [J].
Potamianos, A ;
Maragos, P .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (06) :3795-3806
[10]   Energy Onset Times for Speaker Identification [J].
Quatieri, T. F. ;
Jankowski, C. R., Jr. ;
Reynolds, D. A. .
IEEE SIGNAL PROCESSING LETTERS, 1994, 1 (11) :160-162