MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks

被引：14

作者：

Ding, Wenhao ^{[1
]}

He, Liang ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

generative adversarial networks; speaker verification; triplet loss;

D O I：

10.21437/Interspeech.2018-1023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose an enhanced triplet method that improves the encoding process of embeddings by jointly utilizing generative adversarial mechanism and multitasking optimization. We extend our triplet encoder with Generative Adversarial Networks (GANs) and softmax loss function. GAN is introduced for increasing the generality and diversity of samples, while softmax is for reinforcing features about speakers. For simplification, we term our method Multitasking Triplet Generative Adversarial Networks (MTGAN). Experiment on short utterances demonstrates that MTGAN reduces the verification equal error rate (EER) by 67% (relatively) and 32% (relatively) over conventional i-vector method and state-of-the-art triplet loss method respectively. This effectively indicates that MTGAN outperforms triplet methods in the aspect of expressing the high-level feature of speaker information.

引用

页码：3633 / 3637

页数：5

共 31 条

[1]

ALEXANDER H, 2017, ARXIV170307737

[2]

[Anonymous], 2017, ARXIV171101567

[3]

Bredin H., 2017, IEEE INT C AC SPEECH

[4]

Cao G., 2017, ARXIV171105084

[5]

Chen W., 2017, IEEE INT C COMP VIS

[6]

Chen X., 2016, Advances in neural information processing systems, V2016, P2172

[7]

Cheng D., 2016, IEEE INT C COMP VIS

[8] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[9]

Dong X, 2017, AAAI CONF ARTIF INTE, P1309

[10]

Garofolo J.S., 1993, LINGUIST DATA CONSOR, DOI DOI 10.35111/17GK-BN40

← 1 2 3 4 →