SNR-Invariant Multitask Deep Neural Networks for Robust Speaker Verification

被引：7

作者：

Yao, Qi ^{[1
]}

Mak, Man-Wai ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2018年 / 25卷 / 11期

关键词：

Deep learning; i-vectors; multitask learning; noise robustness; speaker verification; NOISE; PLDA;

D O I：

10.1109/LSP.2018.2870726

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A major challenge in speaker verification is to achieve low error rates under noisy environments. We observed that background noise in utterances will not only enlarge the speakerdependent i-vector clusters but also shift the clusters, with the amount of shift depending on the signal-to-noise ratio (SNR) of the utterances. To overcome this SNR-dependent clustering phenomenon, we propose two deep neural network (DNN) architectures: hierarchical regression DNN (H-RDNN) and multitask DNN (MT-DNN). The H-RDNN is formed by stacking two regression DNNs in which the lower DNN is trained to map noisy i-vectors to their respective speaker-dependent cluster means of clean i-vectors and the upper DNN aims to regularize the outliers that cannot be denoised properly by the lower DNN. The MT-DNN is trained to denoise i-vectors (main task) and classify speakers (auxiliary task). The network leverages the auxiliary task to retain speaker information in the denoised i-vectors. Experimental results suggest that these two DNN architectures together with the PLDA backend significantly outperform the multicondition PLDA model and mixtures of PLDA, and that multitask learning helps to boost verification performance.

引用

页码：1670 / 1674

页数：5

共 50 条

[21] DEEP NEURAL NETWORK DRIVEN MIXTURE OF PLDA FOR ROBUST I-VECTOR SPEAKER VERIFICATION
Li, Na
Mak, Man-Wai
Chien, Jen-Tzung
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 186 - 191
[22] Speaker verification with fake intonation based on Neural Networks
Natalia Vasquez, Angie
Maria Ballesteros, Dora
Renza, Diego
2019 7TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2019,
[23] Insights into Deep Neural Networks for Speaker Recognition
Garcia-Romero, Daniel
McCree, Alan
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1141 - 1145
[24] MULTI-STREAM CONVOLUTIONAL NEURAL NETWORK WITH FREQUENCY SELECTION FOR ROBUST SPEAKER VERIFICATION
Yao, Wei
Chen, Shen
Cui, Jiamin
Lou, Yaolin
COMPUTING AND INFORMATICS, 2024, 43 (04) : 819 - 848
[25] Empowering Speaker Verification with Deep Convolutional Neural Network Vectors
Hourri, Soufiane
STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (02): : 97 - 107
[26] Reversible Neural Networks for Memory-Efficient Speaker Verification
Liu, Bei
Qian, Yanmin
INTERSPEECH 2023, 2023, : 3127 - 3131
[27] Regularized Auto-Associative Neural Networks for Speaker Verification
Sri Garimella
Mallidi, Harish
Hermansky, Hynek
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (12) : 841 - 844
[28] Mixture of Auto-Associative Neural Networks for Speaker Verification
Sivaram, G. S. V. S.
Thomas, Samuel
Hermansky, Hynek
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2392 - +
[29] Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection
Li, Jiakang
Sun, Meng
Zhang, Xiongwei
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1517 - 1522
[30] A COMPLETE END-TO-END SPEAKER VERIFICATION SYSTEM USING DEEP NEURAL NETWORKS: FROM RAW SIGNALS TO VERIFICATION RESULT
Jung, Jee-Weon
Heo, Hee-Soo
Yang, Il-Ho
Shim, Hye-Jin
Yu, Ha-Jin
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5349 - 5353

← 1 2 3 4 5 →