SNR-Invariant Multitask Deep Neural Networks for Robust Speaker Verification

被引:7
|
作者
Yao, Qi [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Deep learning; i-vectors; multitask learning; noise robustness; speaker verification; NOISE; PLDA;
D O I
10.1109/LSP.2018.2870726
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A major challenge in speaker verification is to achieve low error rates under noisy environments. We observed that background noise in utterances will not only enlarge the speakerdependent i-vector clusters but also shift the clusters, with the amount of shift depending on the signal-to-noise ratio (SNR) of the utterances. To overcome this SNR-dependent clustering phenomenon, we propose two deep neural network (DNN) architectures: hierarchical regression DNN (H-RDNN) and multitask DNN (MT-DNN). The H-RDNN is formed by stacking two regression DNNs in which the lower DNN is trained to map noisy i-vectors to their respective speaker-dependent cluster means of clean i-vectors and the upper DNN aims to regularize the outliers that cannot be denoised properly by the lower DNN. The MT-DNN is trained to denoise i-vectors (main task) and classify speakers (auxiliary task). The network leverages the auxiliary task to retain speaker information in the denoised i-vectors. Experimental results suggest that these two DNN architectures together with the PLDA backend significantly outperform the multicondition PLDA model and mixtures of PLDA, and that multitask learning helps to boost verification performance.
引用
收藏
页码:1670 / 1674
页数:5
相关论文
共 50 条
  • [41] Speaker Verification Under Adverse Conditions Using I-vector Adaptation and Neural Networks
    Alam, Jahangir
    Kenny, Patrick
    Bhattacharya, Gautam
    Kockmann, Marcel
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3732 - 3736
  • [42] Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing
    Guzewich, Peter
    Zahorian, Stephen
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 171 - 175
  • [43] Robust Large Margin Deep Neural Networks
    Sokolic, Jure
    Giryes, Raja
    Sapiro, Guillermo
    Rodrigues, Miguel R. D.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (16) : 4265 - 4280
  • [44] Deep neural network based i-vector mapping for speaker verification using short utterances
    Guo, Jinxi
    Xu, Ning
    Qian, Kailun
    Shi, Yang
    Xu, Kaiyuan
    Wu, Yingnian
    Alwan, Abeer
    SPEECH COMMUNICATION, 2018, 105 : 92 - 102
  • [45] Quality Robust Mixtures of Deep Neural Networks
    Dodge, Samuel F.
    Karam, Lina J.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5553 - 5562
  • [46] SPEECH ENHANCEMENT USING LONG SHORT-TERM MEMORY BASED RECURRENT NEURAL NETWORKS FOR NOISE ROBUST SPEAKER VERIFICATION
    Kolbaek, Morten
    Tan, Zheng-Hua
    Jensen, Jesper
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 305 - 311
  • [47] Simultaneous fruit detection and size estimation using multitask deep neural networks
    Ferrer-Ferrer, Mar
    Ruiz-Hidalgo, Javier
    Gregorio, Eduard
    Vilaplana, Veronica
    Morros, Josep-Ramon
    Gene-Mola, Jordi
    BIOSYSTEMS ENGINEERING, 2023, 233 : 63 - 75
  • [48] An iVector Extractor Using Pre-trained Neural Networks for Speaker Verification
    Zhang, Shanshan
    Zheng, Rong
    Xu, Bo
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 73 - 77
  • [49] Input-Relational Verification of Deep Neural Networks
    Banerjee, Debangshu
    Xu, Changming
    Singh, Gagandeep
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2024, 8 (PLDI):
  • [50] Factor Analysis of Auto-Associative Neural Networks With Application in Speaker Verification
    Garimella, Sri
    Hermansky, Hynek
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (04) : 522 - 528