Optimizing Multi-Taper Features for Deep Speaker Verification

被引：0

作者：

Liu, Xuechen ^{[1
,2
]}

Sahidullah, Md ^{[1
]}

Kinnunen, Tomi ^{[2
]}

机构：

[1] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France

[2] Univ Eastern Finland, Sch Comp, FI-80101 Joensuu, Finland

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷

基金：

芬兰科学院;

关键词：

Feature extraction; Discrete Fourier transforms; Task analysis; Neural networks; Mel frequency cepstral coefficient; Stochastic processes; Standards; Multi-taper spectrum; speaker verification; RECOGNITION; MFCC;

D O I：

10.1109/LSP.2021.3122796

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness.

引用

页码：2187 / 2191

页数：5

共 33 条

[1] Alam MJ, 2014, EUR SIGNAL PR CONF, P944
[2] Low-variance Multitaper Mel-frequency Cepstral Coefficient Features for Speech and Speaker Recognition Systems
Alam, Md. Jahangir
Kenny, Patrick
O'Shaughnessy, Douglas
[J]. COGNITIVE COMPUTATION, 2013, 5 (04) : 533 - 544
[3] Multitaper MFCC and PLP features for speaker verification using i-vectors
Alam, Md Jahangir
Kinnunen, Tomi
Kenny, Patrick
Ouellet, Pierre
O'Shaughnessy, Douglas
[J]. SPEECH COMMUNICATION, 2013, 55 (02) : 237 - 251
[4] [Anonymous], 2013, ARXIV13042865
[5] Speaker recognition based on deep learning: An overview
Bai, Zhongxin
Zhang, Xiao-Lei
[J]. NEURAL NETWORKS, 2021, 140 : 65 - 99
[6] Catford J. C., 1988, PRACTICAL INTRO PHON
[7] Chung JS, 2018, INTERSPEECH, P1086
[8] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
Deng, Jiankang
Guo, Jia
Xue, Niannan
Zafeiriou, Stefanos
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4685 - 4694
[9] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Desplanques, Brecht
Thienpondt, Jenthe
Demuynck, Kris
[J]. INTERSPEECH 2020, 2020, : 3830 - 3834
[10] Speaker Recognition by Machines and Humans
Hansen, John H. L.
Hasan, Taufiq
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 74 - 99

← 1 2 3 4 →