Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique

被引：17

作者：

Alam, Md Jahangir ^{[1
,2
]}

Kenny, Patrick ^{[2
]}

O'Shaughnessy, Douglas ^{[1
]}

机构：

[1] Univ Quebec, INRS EMT, Montreal, PQ H3C 3P8, Canada

[2] CRIM, Montreal, PQ H3C 3P8, Canada

来源：

DIGITAL SIGNAL PROCESSING | 2014年 / 29卷

关键词：

Speech recognition; Compressive gammachirp; Auditory spectrum enhancement; Feature normalization; SPEECH; NOISE; COMPENSATION; RECOGNITION; SUPPRESSION; ADAPTATION; MODEL;

D O I：

10.1016/j.dsp.2014.03.001

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper we introduce a robust feature extractor, dubbed as robust compressive gammachirp filterbank cepstral coefficients (RCGCC), based on an asymmetric and level-dependent compressive gammachirp filterbank and a sigmoid shape weighting rule for the enhancement of speech spectra in the auditory domain. The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. As a post processing scheme we employ a short-time feature normalization technique called short-time cepstral mean and scale normalization (STCMSN), which, by adjusting the scale and mean of cepstral features, reduces the difference of cepstra between the training and test environments. For performance evaluation, in the context of speech recognition, of the proposed feature extractor we use the standard noisy AURORA-2 connected digit corpus, the meeting recorder digits (MRDs) subset of the AURORA-5 corpus, and the AURORA-4 LVCSR corpus, which represent additive noise, reverberant acoustic conditions and additive noise as well as different microphone channel conditions, respectively. The ETSI advanced front-end (ETSI-AFE), the recently proposed power normalized cepstral coefficients (PNCC), conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results demonstrate that the proposed method is robust against both additive and reverberant environments. The proposed method provides comparable results to that of the ETSI-AFE and PNCC on the AURORA-2 as well as AURORA-4 corpora and provides considerable improvements with respect to the other feature extractors on the AURORA-5 corpus. (c) 2014 Elsevier Inc. All rights reserved.

引用

页码：147 / 157

页数：11

共 61 条

[1]

Abdulla W.H., 2002, Advances in Communications and Software Technologies, P231

[2]

Acero A., 1990, THESIS ECE CARNEGIE

[3]

Alam Md Jahangir, 2013, Advances in Nonlinear Speech Processing. 6th International Conference, NOLISP 2013. Proceedings. LNCS 7911, P168, DOI 10.1007/978-3-642-38847-7_22

[4]

Alam M.J., 2012, INTERSPEECH

[5]

Alam MJ, 2011, LECT NOTES ARTIF INT, V7015, P246, DOI 10.1007/978-3-642-25020-0_32

[6]

[Anonymous], 2000, INTERSPEECH, DOI DOI 10.1016/S0167-6393(03)00016-5

[7]

[Anonymous], 1960, Experiments in Hearing

[8] EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].

ATAL, BS .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312

[9]

Berouti M., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P208

[10] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

← 1 2 3 4 5 6 7 →