Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique

被引:17
作者
Alam, Md Jahangir [1 ,2 ]
Kenny, Patrick [2 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ H3C 3P8, Canada
[2] CRIM, Montreal, PQ H3C 3P8, Canada
关键词
Speech recognition; Compressive gammachirp; Auditory spectrum enhancement; Feature normalization; SPEECH; NOISE; COMPENSATION; RECOGNITION; SUPPRESSION; ADAPTATION; MODEL;
D O I
10.1016/j.dsp.2014.03.001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we introduce a robust feature extractor, dubbed as robust compressive gammachirp filterbank cepstral coefficients (RCGCC), based on an asymmetric and level-dependent compressive gammachirp filterbank and a sigmoid shape weighting rule for the enhancement of speech spectra in the auditory domain. The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. As a post processing scheme we employ a short-time feature normalization technique called short-time cepstral mean and scale normalization (STCMSN), which, by adjusting the scale and mean of cepstral features, reduces the difference of cepstra between the training and test environments. For performance evaluation, in the context of speech recognition, of the proposed feature extractor we use the standard noisy AURORA-2 connected digit corpus, the meeting recorder digits (MRDs) subset of the AURORA-5 corpus, and the AURORA-4 LVCSR corpus, which represent additive noise, reverberant acoustic conditions and additive noise as well as different microphone channel conditions, respectively. The ETSI advanced front-end (ETSI-AFE), the recently proposed power normalized cepstral coefficients (PNCC), conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results demonstrate that the proposed method is robust against both additive and reverberant environments. The proposed method provides comparable results to that of the ETSI-AFE and PNCC on the AURORA-2 as well as AURORA-4 corpora and provides considerable improvements with respect to the other feature extractors on the AURORA-5 corpus. (c) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 61 条
  • [51] Sarikaya R., 2001, P EUR
  • [52] Schlüter R, 2007, INT CONF ACOUST SPEE, P649
  • [53] Nonlinear compensation for stochastic matching
    Surendran, AC
    Lee, CH
    Rahim, M
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (06): : 643 - 655
  • [54] Tam Y.C., 2000, P ICSLP, P313
  • [55] VARGA AP, 1990, INT CONF ACOUST SPEE, P845, DOI 10.1109/ICASSP.1990.115970
  • [56] Cepstral domain segmental feature vector normalization for noise robust speech recognition
    Viikki, O
    Laurila, K
    [J]. SPEECH COMMUNICATION, 1998, 25 (1-3) : 133 - 147
  • [57] Xiang B, 2002, INT CONF ACOUST SPEE, P681
  • [58] Yeung Siu-Kei Au, 2004, 8 INT C SPOK LANG PR
  • [59] Young S., 2002, HTK BOOK
  • [60] Zhu W., 2003, P ASRU 2003 US VIRG