SNR-Invariant Multitask Deep Neural Networks for Robust Speaker Verification

被引:7
|
作者
Yao, Qi [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Deep learning; i-vectors; multitask learning; noise robustness; speaker verification; NOISE; PLDA;
D O I
10.1109/LSP.2018.2870726
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A major challenge in speaker verification is to achieve low error rates under noisy environments. We observed that background noise in utterances will not only enlarge the speakerdependent i-vector clusters but also shift the clusters, with the amount of shift depending on the signal-to-noise ratio (SNR) of the utterances. To overcome this SNR-dependent clustering phenomenon, we propose two deep neural network (DNN) architectures: hierarchical regression DNN (H-RDNN) and multitask DNN (MT-DNN). The H-RDNN is formed by stacking two regression DNNs in which the lower DNN is trained to map noisy i-vectors to their respective speaker-dependent cluster means of clean i-vectors and the upper DNN aims to regularize the outliers that cannot be denoised properly by the lower DNN. The MT-DNN is trained to denoise i-vectors (main task) and classify speakers (auxiliary task). The network leverages the auxiliary task to retain speaker information in the denoised i-vectors. Experimental results suggest that these two DNN architectures together with the PLDA backend significantly outperform the multicondition PLDA model and mixtures of PLDA, and that multitask learning helps to boost verification performance.
引用
收藏
页码:1670 / 1674
页数:5
相关论文
共 50 条
  • [1] SNR-Invariant PLDA Modeling for Robust Speaker Verification
    Li, Na
    Mak, Man-Wai
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2317 - 2321
  • [2] SNR-Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verification
    Li, Na
    Mak, Man-Wai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (10) : 1648 - 1659
  • [3] SNR-INVARIANT PLDA WITH MULTIPLE SPEAKER SUBSPACES
    Li, Na
    Mak, Man-Wai
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5565 - 5569
  • [4] MODELLING SPEAKER AND CHANNEL VARIABILITY USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    Gupta, Vishwa
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 192 - 198
  • [5] DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification
    Tan, Zhili
    Mak, Man-Wai
    Mak, Brian Kan-Wing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (04) : 700 - 712
  • [6] Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA
    Pang, Xiaomin
    Mak, Man-Wai
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (04) : 633 - 648
  • [7] Fusion of SNR-Dependent PLDA Models for Noise Robust Speaker Verification
    Pang, Xiaomin
    Mak, Man-Wai
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 619 - 623
  • [8] ASVtorch toolkit: Speaker verification with deep neural networks
    Lee, Kong Aik
    Vestman, Ville
    Kinnunen, Tomi
    SOFTWAREX, 2021, 14
  • [9] Investigation of Bottleneck Features and Multilingual Deep Neural Networks for Speaker Verification
    Tian, Yao
    Cai, Meng
    He, Liang
    Liu, Jia
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1151 - 1155
  • [10] Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification
    Li, Na
    Mak, Man-Wai
    Lin, Wei-Wei
    Chien, Jen-Tzung
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 83 - 103