Gammatone Wavelet Cepstral Coefficients for Robust Speech Recognition

被引:0
作者
Adiga, Aniruddha [1 ]
Magimai-Doss, Mathew [2 ]
Seelamantula, Chandra Sekhar [1 ]
机构
[1] Indian Inst Sci, Dept Elect Engn, Bangalore 560012, Karnataka, India
[2] diap Res Inst, Martigny, Switzerland
来源
2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON) | 2013年
基金
瑞士国家科学基金会;
关键词
Gammatone wavelets; Auditory modeling; Cepstral coefficients; Speech recognition; REPRESENTATIONS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We develop noise robust features using Gammatone wavelets derived from the popular Gammatone functions. These wavelets incorporate the characteristics of human peripheral auditory systems, in particular the spatially-varying frequency response of the basilar membrane. We refer to the new features as Gammatone Wavelet Cepstral Coefficients (GWCC). The procedure involved in extracting GWCC from a speech signal is similar to that of the conventional Mel-Frequency Cepstral Coefficients (MFCC) technique, with the difference being in the type of filterbank used. We replace the conventional mel filterbank in MFCC with a Gammatone wavelet filterbank, which we construct using Gammatone wavelets. We also explore the effect of Gammatone filterbank based features (Gammatone Cepstral Coefficients (GCC)) for robust speech recognition. On AURORA 2 database, a comparison of GWCCs and GCCs with MFCCs shows that Gammatone based features yield a better recognition performance at low SNRs.
引用
收藏
页数:4
相关论文
共 17 条
  • [1] [Anonymous], P INTERSPEECH
  • [2] [Anonymous], 1998, READINGS COMPUTATION
  • [3] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
    DAVIS, SB
    MERMELSTEIN, P
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
  • [4] Gold Ben., 2000, SPEECH AUDIO SIGNAL
  • [5] Hirsch H.-G., 2000, 6 INT C SPOKEN LANGU, P181
  • [6] An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions
    Li, Qi
    Huang, Yan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06): : 1791 - 1801
  • [7] Mallat S., WAVELET TOUR SIGNAL, DOI DOI 10.1152/jn.00681.2004
  • [8] Mertins A, 2005, 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P308
  • [9] Patterson R., 1987, SPEECH GROUP M I AC, V54
  • [10] Schlüter R, 2007, INT CONF ACOUST SPEE, P649