Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique

被引：17

作者：

Alam, Md Jahangir ^{[1
,2
]}

Kenny, Patrick ^{[2
]}

O'Shaughnessy, Douglas ^{[1
]}

机构：

[1] Univ Quebec, INRS EMT, Montreal, PQ H3C 3P8, Canada

[2] CRIM, Montreal, PQ H3C 3P8, Canada

来源：

DIGITAL SIGNAL PROCESSING | 2014年 / 29卷

关键词：

Speech recognition; Compressive gammachirp; Auditory spectrum enhancement; Feature normalization; SPEECH; NOISE; COMPENSATION; RECOGNITION; SUPPRESSION; ADAPTATION; MODEL;

D O I：

10.1016/j.dsp.2014.03.001

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper we introduce a robust feature extractor, dubbed as robust compressive gammachirp filterbank cepstral coefficients (RCGCC), based on an asymmetric and level-dependent compressive gammachirp filterbank and a sigmoid shape weighting rule for the enhancement of speech spectra in the auditory domain. The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. As a post processing scheme we employ a short-time feature normalization technique called short-time cepstral mean and scale normalization (STCMSN), which, by adjusting the scale and mean of cepstral features, reduces the difference of cepstra between the training and test environments. For performance evaluation, in the context of speech recognition, of the proposed feature extractor we use the standard noisy AURORA-2 connected digit corpus, the meeting recorder digits (MRDs) subset of the AURORA-5 corpus, and the AURORA-4 LVCSR corpus, which represent additive noise, reverberant acoustic conditions and additive noise as well as different microphone channel conditions, respectively. The ETSI advanced front-end (ETSI-AFE), the recently proposed power normalized cepstral coefficients (PNCC), conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results demonstrate that the proposed method is robust against both additive and reverberant environments. The proposed method provides comparable results to that of the ETSI-AFE and PNCC on the AURORA-2 as well as AURORA-4 corpora and provides considerable improvements with respect to the other feature extractors on the AURORA-5 corpus. (c) 2014 Elsevier Inc. All rights reserved.

引用

页码：147 / 157

页数：11

共 61 条

[21] CEPSTRAL ANALYSIS TECHNIQUE FOR AUTOMATIC SPEAKER VERIFICATION [J].

FURUI, S .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (02) :254-272

[22] CEPSTRAL PARAMETER COMPENSATION FOR HMM RECOGNITION IN NOISE [J].

GALES, MJF ;

YOUNG, SJ .

SPEECH COMMUNICATION, 1993, 12 (03) :231-239

[23] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

[24]

Gerkmann T, 2011, 2011 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), P145, DOI 10.1109/ASPAA.2011.6082266

[25] MMSE BASED NOISE PSD TRACKING WITH LOW COMPLEXITY [J].

Hendriks, Richard C. ;

Heusdens, Richard ;

Jensen, Jesper .

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4266-4269

[26] PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].

HERMANSKY, H .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752

[27] RASTA Processing of Speech [J].

Hermansky, Hynek ;

Morgan, Nelson .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589

[28]

Hirsch H. G., AURORA 5 EXPT FRAMEW

[29]

Hirsch H.G., 2000, P ISCA WORKSH ASR PA

[30] A new approach for the adaptation of HMMs to reverberation and background noise [J].

Hirsch, Hans-Guenter ;

Finster, Harald .

SPEECH COMMUNICATION, 2008, 50 (03) :244-263

← 1 2 3 4 5 6 7 →