Learnable MFCCs for Speaker Verification

被引:5
|
作者
Liu, Xuechen [1 ,2 ]
Sahidullah, Md [2 ]
Kinnunen, Tomi [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu, Finland
[2] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France
来源
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS) | 2021年
基金
芬兰科学院;
关键词
Speaker verification; feature extraction; mel-frequency cesptral coefficients (MFCCs); RECOGNITION; FEATURES;
D O I
10.1109/ISCAS51556.2021.9401593
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor - windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] LEARNABLE NONLINEAR COMPRESSION FOR ROBUST SPEAKER VERIFICATION
    Liu, Xuechen
    Sahidullah, Md
    Kinnunen, Tomi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7962 - 7966
  • [2] Learnable Sparse Filterbank for Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Mosner, Ladislav
    Plchot, Oldrich
    Burget, Lukas
    Cernocky, Jan
    INTERSPEECH 2022, 2022, : 5110 - 5114
  • [3] Advantages of Wideband over Narrowband Channels for Speaker Verification Employing MFCCs and LFCCs
    Gallardo, Laura Fernandez
    Wagner, Michael
    Moeller, Sebastian
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1115 - 1119
  • [4] Significance of analytic phase of speech signals in speaker verification
    Vijayan, Karthika
    Reddy, Pappagari Raghavendra
    Murty, K. Sri Rama
    SPEECH COMMUNICATION, 2016, 81 : 54 - 71
  • [5] Scale-invariant MFCCs for speech/speaker recognition
    Tufekci, Zekeriya
    Disken, Gokay
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3758 - 3762
  • [6] Combining Deep Speaker Specific Representations with GMM-SVM for Speaker Verification
    Price, Ryan
    Biswas, Sangeeta
    Shinoda, Koichi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2787 - 2791
  • [7] Deep neural network framework and transformed MFCCs for speaker's age and gender classification
    Qawaqneh, Zakariya
    Abu Mallouh, Arafat
    Barkana, Buket D.
    KNOWLEDGE-BASED SYSTEMS, 2017, 115 : 5 - 14
  • [8] Local spectral variability features for speaker verification
    Sahidullah, Md
    Kinnunen, Tomi
    DIGITAL SIGNAL PROCESSING, 2016, 50 : 1 - 11
  • [9] On evaluation trials in speaker verification
    Li, Lantian
    Wang, Di
    Abel, Andrew
    Wang, Dong
    APPLIED INTELLIGENCE, 2024, 54 (01) : 113 - 130
  • [10] Brief Review of Short Utterance Speaker Verification Systems
    Nirmal, Asmita
    Jayaswal, Deepak
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (14): : 419 - 426