SNR-Invariant Multitask Deep Neural Networks for Robust Speaker Verification

被引：7

作者：

Yao, Qi ^{[1
]}

Mak, Man-Wai ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2018年 / 25卷 / 11期

关键词：

Deep learning; i-vectors; multitask learning; noise robustness; speaker verification; NOISE; PLDA;

D O I：

10.1109/LSP.2018.2870726

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A major challenge in speaker verification is to achieve low error rates under noisy environments. We observed that background noise in utterances will not only enlarge the speakerdependent i-vector clusters but also shift the clusters, with the amount of shift depending on the signal-to-noise ratio (SNR) of the utterances. To overcome this SNR-dependent clustering phenomenon, we propose two deep neural network (DNN) architectures: hierarchical regression DNN (H-RDNN) and multitask DNN (MT-DNN). The H-RDNN is formed by stacking two regression DNNs in which the lower DNN is trained to map noisy i-vectors to their respective speaker-dependent cluster means of clean i-vectors and the upper DNN aims to regularize the outliers that cannot be denoised properly by the lower DNN. The MT-DNN is trained to denoise i-vectors (main task) and classify speakers (auxiliary task). The network leverages the auxiliary task to retain speaker information in the denoised i-vectors. Experimental results suggest that these two DNN architectures together with the PLDA backend significantly outperform the multicondition PLDA model and mixtures of PLDA, and that multitask learning helps to boost verification performance.

引用

页码：1670 / 1674

页数：5

共 50 条

[41] Speaker Verification Under Adverse Conditions Using I-vector Adaptation and Neural Networks
Alam, Jahangir
Kenny, Patrick
Bhattacharya, Gautam
Kockmann, Marcel
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3732 - 3736
[42] Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing
Guzewich, Peter
Zahorian, Stephen
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 171 - 175
[43] Robust Large Margin Deep Neural Networks
Sokolic, Jure
Giryes, Raja
Sapiro, Guillermo
Rodrigues, Miguel R. D.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (16) : 4265 - 4280
[44] Deep neural network based i-vector mapping for speaker verification using short utterances
Guo, Jinxi
Xu, Ning
Qian, Kailun
Shi, Yang
Xu, Kaiyuan
Wu, Yingnian
Alwan, Abeer
SPEECH COMMUNICATION, 2018, 105 : 92 - 102
[45] Quality Robust Mixtures of Deep Neural Networks
Dodge, Samuel F.
Karam, Lina J.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5553 - 5562
[46] SPEECH ENHANCEMENT USING LONG SHORT-TERM MEMORY BASED RECURRENT NEURAL NETWORKS FOR NOISE ROBUST SPEAKER VERIFICATION
Kolbaek, Morten
Tan, Zheng-Hua
Jensen, Jesper
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 305 - 311
[47] Simultaneous fruit detection and size estimation using multitask deep neural networks
Ferrer-Ferrer, Mar
Ruiz-Hidalgo, Javier
Gregorio, Eduard
Vilaplana, Veronica
Morros, Josep-Ramon
Gene-Mola, Jordi
BIOSYSTEMS ENGINEERING, 2023, 233 : 63 - 75
[48] An iVector Extractor Using Pre-trained Neural Networks for Speaker Verification
Zhang, Shanshan
Zheng, Rong
Xu, Bo
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 73 - 77
[49] Input-Relational Verification of Deep Neural Networks
Banerjee, Debangshu
Xu, Changming
Singh, Gagandeep
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2024, 8 (PLDI):
[50] Factor Analysis of Auto-Associative Neural Networks With Application in Speaker Verification
Garimella, Sri
Hermansky, Hynek
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (04) : 522 - 528

← 1 2 3 4 5 →