Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

被引：9

作者：

Fedila, M. ^{[1
,2
]}

Bengherabi, M. ^{[2
]}

Amrouche, A. ^{[1
]}

机构：

[1] USTHB, Fac Elect & Comp Sci, Algiers, Algeria

[2] Ctr Dev Technol Avancees, Algiers, Algeria

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2018年 / 77卷 / 13期

关键词：

Gammatone filter-bank; Group-delay; Automatic speaker verification; GMM-UBM; G722.2; GROUP DELAY FEATURE; RECOGNITION;

D O I：

10.1007/s11042-017-5237-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The main novelty of this work resides in incorporating a Gammatone filter-bank as a substitute of the Mel filter-bank in the extraction pipeline of the Product Spectrum PS. The proposed feature is dubbed the Gammatone Product-Spectrum Cepstral coefficients GPSCC. Experimental results are undertaken on TIMIT and noisy TIMIT corpora using the Gaussian Mixture Model with Universal Background Model (GMM-UBM) recognition algorithm. Performance evaluations indicate that GPSCC shows a drastic reduction in Equal Error Rates compared to other related features and this gain in performance is more pronounced at low signal to noise ratios. Also, our study demonstrates the merit of the Gammatone filter-bank in improving robustness to codec-degraded speech at different bit rates. Furthermore, the proposed GPSCC feature achieves the best verification performance under aggressive compression. Interestingly, at 6.60 kbps we observe that GPSCC achieves an absolute error reduction of 12% compared to the Mel Frequency Cepstral Coefficients (MFCC).

引用

页码：16721 / 16739

页数：19

共 42 条

[1] Short-time phase spectrum in speech processing: A review and some experimental results [J].

Alsteris, Leigh D. ;

Paliwal, Kuldip K. .

DIGITAL SIGNAL PROCESSING, 2007, 17 (03) :578-616

[2]

[Anonymous], 2004 IEEE INT C AC S

[3]

[Anonymous], 1997, P EUR 97 RHOD GREEC

[4] Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers [J].

Asbai N. ;

Bengherabi M. ;

Amrouche A. ;

Aklouf Y. .

International Journal of Speech Technology, 2015, 18 (02) :195-203

[5]

Boulkenafet Z, 2013, 2013 INT C BIOSIG SP, P241

[6] Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006 [J].

Bruemmer, Niko ;

Burget, Lukas ;

Cernocky, Jan 'Honza' ;

Glembek, Ondrej ;

Grezl, Frantisek ;

Karafiat, Martin ;

van Leeuwen, David A. ;

Matejka, Pavel ;

Schwarz, Petr ;

Strasheim, Albert .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2072-2084

[7]

Brummer N., FOCAL TOOLS FUSION C

[8] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[9] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[10] On the Effects of Filterbank Design and Energy Computation on Robust Speech Recognition [J].

Dimitriadis, Dimitrios ;

Maragos, Petros ;

Potamianos, Alexandros .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06) :1504-1516

← 1 2 3 4 5 →