Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization

被引：10

作者：

Le, Nam ^{[1
,2
]}

Odobez, Jean-Marc ^{[1
,2
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

欧盟地平线“2020”;

关键词：

speaker verification; deep neural networks; embedding learning; triplet loss;

D O I：

10.21437/Interspeech.2018-1685

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning a good speaker embedding is critical for many speech processing tasks, including recognition, verification, and diarization. To this end, we propose a complementary optimizing goal called intra-class loss to improve deep speaker embed dings learned with triplet loss. This loss function is formulated as a soft constraint on the averaged pair-wise distance between samples from the same class. Its goal is to prevent the scattering of these samples within the embedding space to increase the intra-class compactncss.When intra-class loss is jointly optimized with triplet loss, we can observe 2 major improvements: the deep embedding network can achieve a more robust and discriminative representation and the training process is more stable with a faster convergence rate. We conduct experiments on 2 large public benchmarking datasets for speaker verification, VoxCeleb and VoxForge. The results show that intra-class loss helps accelerating the convergence of deep network training and significantly improves the overall performance of the resulted embeddings.

引用

页码：2257 / 2261

页数：5

共 22 条

[1]

[Anonymous], 2017, P INT

[2]

[Anonymous], 2017, CVPR, DOI DOI 10.1109/ICCV.2017.309

[3]

Bredin H, 2017, ICASSP

[4]

Cumani S, 2013, INT CONF ACOUST SPEE, P7644, DOI 10.1109/ICASSP.2013.6639150

[5] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[6]

Garcia-Romero D, 2017, INT CONF ACOUST SPEE, P4930, DOI 10.1109/ICASSP.2017.7953094

[7]

Ghalehjegh SH, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P555, DOI 10.1109/ASRU.2015.7404844

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9]

Li C., 2017, ARXIV170502304

[10]

Madikeri S, 2017, INT CONF ACOUST SPEE, P5365, DOI 10.1109/ICASSP.2017.7953181

← 1 2 3 →