MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks

被引:14
作者
Ding, Wenhao [1 ]
He, Liang [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
generative adversarial networks; speaker verification; triplet loss;
D O I
10.21437/Interspeech.2018-1023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose an enhanced triplet method that improves the encoding process of embeddings by jointly utilizing generative adversarial mechanism and multitasking optimization. We extend our triplet encoder with Generative Adversarial Networks (GANs) and softmax loss function. GAN is introduced for increasing the generality and diversity of samples, while softmax is for reinforcing features about speakers. For simplification, we term our method Multitasking Triplet Generative Adversarial Networks (MTGAN). Experiment on short utterances demonstrates that MTGAN reduces the verification equal error rate (EER) by 67% (relatively) and 32% (relatively) over conventional i-vector method and state-of-the-art triplet loss method respectively. This effectively indicates that MTGAN outperforms triplet methods in the aspect of expressing the high-level feature of speaker information.
引用
收藏
页码:3633 / 3637
页数:5
相关论文
共 31 条
[1]  
ALEXANDER H, 2017, ARXIV170307737
[2]  
[Anonymous], 2017, ARXIV171101567
[3]  
Bredin H., 2017, IEEE INT C AC SPEECH
[4]  
Cao G., 2017, ARXIV171105084
[5]  
Chen W., 2017, IEEE INT C COMP VIS
[6]  
Chen X., 2016, Advances in neural information processing systems, V2016, P2172
[7]  
Cheng D., 2016, IEEE INT C COMP VIS
[8]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[9]  
Dong X, 2017, AAAI CONF ARTIF INTE, P1309
[10]  
Garofolo J.S., 1993, LINGUIST DATA CONSOR, DOI DOI 10.35111/17GK-BN40