Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization

被引:10
作者
Le, Nam [1 ,2 ]
Odobez, Jean-Marc [1 ,2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
基金
欧盟地平线“2020”;
关键词
speaker verification; deep neural networks; embedding learning; triplet loss;
D O I
10.21437/Interspeech.2018-1685
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a good speaker embedding is critical for many speech processing tasks, including recognition, verification, and diarization. To this end, we propose a complementary optimizing goal called intra-class loss to improve deep speaker embed dings learned with triplet loss. This loss function is formulated as a soft constraint on the averaged pair-wise distance between samples from the same class. Its goal is to prevent the scattering of these samples within the embedding space to increase the intra-class compactncss.When intra-class loss is jointly optimized with triplet loss, we can observe 2 major improvements: the deep embedding network can achieve a more robust and discriminative representation and the training process is more stable with a faster convergence rate. We conduct experiments on 2 large public benchmarking datasets for speaker verification, VoxCeleb and VoxForge. The results show that intra-class loss helps accelerating the convergence of deep network training and significantly improves the overall performance of the resulted embeddings.
引用
收藏
页码:2257 / 2261
页数:5
相关论文
共 22 条
[1]  
[Anonymous], 2017, P INT
[2]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/ICCV.2017.309
[3]  
Bredin H, 2017, ICASSP
[4]  
Cumani S, 2013, INT CONF ACOUST SPEE, P7644, DOI 10.1109/ICASSP.2013.6639150
[5]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[6]  
Garcia-Romero D, 2017, INT CONF ACOUST SPEE, P4930, DOI 10.1109/ICASSP.2017.7953094
[7]  
Ghalehjegh SH, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P555, DOI 10.1109/ASRU.2015.7404844
[8]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[9]  
Li C., 2017, ARXIV170502304
[10]  
Madikeri S, 2017, INT CONF ACOUST SPEE, P5365, DOI 10.1109/ICASSP.2017.7953181