SNR-Invariant Multitask Deep Neural Networks for Robust Speaker Verification

被引:7
|
作者
Yao, Qi [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Deep learning; i-vectors; multitask learning; noise robustness; speaker verification; NOISE; PLDA;
D O I
10.1109/LSP.2018.2870726
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A major challenge in speaker verification is to achieve low error rates under noisy environments. We observed that background noise in utterances will not only enlarge the speakerdependent i-vector clusters but also shift the clusters, with the amount of shift depending on the signal-to-noise ratio (SNR) of the utterances. To overcome this SNR-dependent clustering phenomenon, we propose two deep neural network (DNN) architectures: hierarchical regression DNN (H-RDNN) and multitask DNN (MT-DNN). The H-RDNN is formed by stacking two regression DNNs in which the lower DNN is trained to map noisy i-vectors to their respective speaker-dependent cluster means of clean i-vectors and the upper DNN aims to regularize the outliers that cannot be denoised properly by the lower DNN. The MT-DNN is trained to denoise i-vectors (main task) and classify speakers (auxiliary task). The network leverages the auxiliary task to retain speaker information in the denoised i-vectors. Experimental results suggest that these two DNN architectures together with the PLDA backend significantly outperform the multicondition PLDA model and mixtures of PLDA, and that multitask learning helps to boost verification performance.
引用
收藏
页码:1670 / 1674
页数:5
相关论文
共 50 条
  • [31] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Monteiro, Joao
    Alam, Jahangir
    Kenny, Patrick
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
  • [32] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
    Snyder, David
    Ghahremani, Pegah
    Povey, Daniel
    Garcia-Romero, Daniel
    Carmiel, Yishay
    Khudanpur, Sanjeev
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
  • [33] A Deep Neural Network Speaker Verification System Targeting Microphone Speech
    Lei, Yun
    Ferrer, Luciana
    McLaren, Mitchell
    Scheffer, Nicolas
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 681 - 685
  • [34] Total Variability Layer in Deep Neural Network Embeddings for Speaker Verification
    Travadi, Ruchir
    Narayanan, Shrikanth
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (06) : 893 - 897
  • [35] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [36] DeepDyve: Dynamic Verification for Deep Neural Networks
    Li, Yu
    Li, Min
    Luo, Bo
    Tian, Ye
    Xu, Qiang
    CCS '20: PROCEEDINGS OF THE 2020 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2020, : 101 - 112
  • [37] Text-independent speaker verification using predictive neural networks
    Finan, RA
    Sapeluk, AT
    Damper, RI
    FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 274 - 279
  • [38] OPTIMIZED POWER NORMALIZED CEPSTRAL COEFFICIENTS TOWARDS ROBUST DEEP SPEAKER VERIFICATION
    Liu, Xuechen
    Sahidullah, Md
    Kinnunen, Tomi
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 185 - 190
  • [39] MULTITASK CLASSIFICATION OF REMOTE SENSING SCENES USING DEEP NEURAL NETWORKS
    Alhichri, Haikel
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 1195 - 1198
  • [40] DEEP NEURAL NETWORK BASED DISCRIMINATIVE TRAINING FOR I-VECTOR/PLDA SPEAKER VERIFICATION
    Zheng Tieran
    Han Jiqing
    Zheng Guibin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5354 - 5358