Learnable MFCCs for Speaker Verification

被引：5

作者：

Liu, Xuechen ^{[1
,2
]}

Sahidullah, Md ^{[2
]}

Kinnunen, Tomi ^{[1
]}

机构：

[1] Univ Eastern Finland, Sch Comp, Joensuu, Finland

[2] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France

来源：

2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS) | 2021年

基金：

芬兰科学院;

关键词：

Speaker verification; feature extraction; mel-frequency cesptral coefficients (MFCCs); RECOGNITION; FEATURES;

D O I：

10.1109/ISCAS51556.2021.9401593

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor - windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort.

引用

页数：5

共 50 条

[1] LEARNABLE NONLINEAR COMPRESSION FOR ROBUST SPEAKER VERIFICATION
Liu, Xuechen
Sahidullah, Md
Kinnunen, Tomi
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7962 - 7966
[2] Learnable Sparse Filterbank for Speaker Verification
Peng, Junyi
Gu, Rongzhi
Mosner, Ladislav
Plchot, Oldrich
Burget, Lukas
Cernocky, Jan
INTERSPEECH 2022, 2022, : 5110 - 5114
[3] Advantages of Wideband over Narrowband Channels for Speaker Verification Employing MFCCs and LFCCs
Gallardo, Laura Fernandez
Wagner, Michael
Moeller, Sebastian
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1115 - 1119
[4] Significance of analytic phase of speech signals in speaker verification
Vijayan, Karthika
Reddy, Pappagari Raghavendra
Murty, K. Sri Rama
SPEECH COMMUNICATION, 2016, 81 : 54 - 71
[5] Scale-invariant MFCCs for speech/speaker recognition
Tufekci, Zekeriya
Disken, Gokay
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3758 - 3762
[6] Combining Deep Speaker Specific Representations with GMM-SVM for Speaker Verification
Price, Ryan
Biswas, Sangeeta
Shinoda, Koichi
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2787 - 2791
[7] Deep neural network framework and transformed MFCCs for speaker's age and gender classification
Qawaqneh, Zakariya
Abu Mallouh, Arafat
Barkana, Buket D.
KNOWLEDGE-BASED SYSTEMS, 2017, 115 : 5 - 14
[8] Local spectral variability features for speaker verification
Sahidullah, Md
Kinnunen, Tomi
DIGITAL SIGNAL PROCESSING, 2016, 50 : 1 - 11
[9] On evaluation trials in speaker verification
Li, Lantian
Wang, Di
Abel, Andrew
Wang, Dong
APPLIED INTELLIGENCE, 2024, 54 (01) : 113 - 130
[10] Brief Review of Short Utterance Speaker Verification Systems
Nirmal, Asmita
Jayaswal, Deepak
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (14): : 419 - 426

← 1 2 3 4 5 →