Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

被引:0
作者
M. Fedila
M. Bengherabi
A. Amrouche
机构
[1] USTHB,Faculty of Electronics and Computer Sciences
[2] Centre de Développement des Technologies Avancées,undefined
来源
Multimedia Tools and Applications | 2018年 / 77卷
关键词
Gammatone filter-bank; Group-delay; Automatic speaker verification; GMM-UBM; G722.2;
D O I
暂无
中图分类号
学科分类号
摘要
The main novelty of this work resides in incorporating a Gammatone filter-bank as a substitute of the Mel filter-bank in the extraction pipeline of the Product Spectrum PS. The proposed feature is dubbed the Gammatone Product-Spectrum Cepstral coefficients GPSCC. Experimental results are undertaken on TIMIT and noisy TIMIT corpora using the Gaussian Mixture Model with Universal Background Model (GMM-UBM) recognition algorithm. Performance evaluations indicate that GPSCC shows a drastic reduction in Equal Error Rates compared to other related features and this gain in performance is more pronounced at low signal to noise ratios. Also, our study demonstrates the merit of the Gammatone filter-bank in improving robustness to codec-degraded speech at different bit rates. Furthermore, the proposed GPSCC feature achieves the best verification performance under aggressive compression. Interestingly, at 6.60 kbps we observe that GPSCC achieves an absolute error reduction of 12% compared to the Mel Frequency Cepstral Coefficients (MFCC).
引用
收藏
页码:16721 / 16739
页数:18
相关论文
共 48 条
[1]  
Alsteris LD(2007)Short-time phase spectrum in speech processing: a review and some experimental results Digit Signal Process 17 578-616
[2]  
Paliwal KK(2015)Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers Int J Speech Technol 18 195-203
[3]  
Asbai N(1980)Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Trans Acous Speech Signal Process 28 357-366
[4]  
Bengherabi M(2011)Front-end factor analysis for speaker verification IEEE Trans Audio Speech Lang Process 19 788-798
[5]  
Amrouche A(2011)On the effects of filterbank design and energy computation on robust speech recognition IEEE Trans Audio Speech Lang Process 19 1504-1516
[6]  
Aklouf Y(2015)Phase processing for single channel speech enhancement: history and recent advances IEEE Signal Process Mag 32 55-66
[7]  
Davis SB(2007)Significance of the modified group delay feature in speech recognition IEEE Trans Audio Speech Lang Process 15 190-202
[8]  
Mermelstein P(2007)Joint factor analysis versus Eigenchannels in speaker recognition IEEE Trans Audio Speech and Lang Process 15 1435-1447
[9]  
Dehak N(2016)Power-normalized cepstral coefficients (PNCC) for robust speech recognition IEEE Trans Audio Speech Lang Process 24 1315-1329
[10]  
Kenny P(2011)An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions IEEE Trans Audio Speech and Lang Process 19 1791-1801