Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

被引:13
作者
Kang, Woo Hyun [2 ]
Mun, Sung Hwan [2 ]
Han, Min Hyun [2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Seoul, South Korea
关键词
Training; Robustness; Performance evaluation; Law enforcement; Machine learning; Task analysis; Licenses; Speech embedding; speaker verification; domain disentanglement; deep learning; RECOGNITION;
D O I
10.1109/ACCESS.2020.3012893
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.
引用
收藏
页码:141838 / 141849
页数:12
相关论文
共 40 条
[21]   Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors [J].
Maghsoodi, Nooshin ;
Sameti, Hossein ;
Zeinal, Hossein ;
Stafylakis, Themos .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) :1815-1825
[22]  
Meng Z, 2019, INT CONF ACOUST SPEE, P6216, DOI [10.1109/ICASSP.2019.8682488, 10.1109/icassp.2019.8682488]
[23]  
Mirza M., 2014, P NIPS, P2672
[24]  
Morgen O., 2018, THESIS
[25]   VoxCeleb: a large-scale speaker identification dataset [J].
Nagrani, Arsha ;
Chung, Joon Son ;
Zisserman, Andrew .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2616-2620
[26]  
Nidadavolu PS, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P710, DOI [10.1109/ASRU46091.2019.9003748, 10.1109/asru46091.2019.9003748]
[27]  
Ramos D, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P1493
[28]   Deep Neural Network Approaches to Speaker and Language Recognition [J].
Richardson, Fred ;
Reynolds, Douglas ;
Dehak, Najim .
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) :1671-1675
[29]  
Sak H, 2014, INTERSPEECH, P338
[30]   Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition [J].
Shinohara, Yusuke .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2369-2372