Self-Attention Networks for Text-Independent Speaker Verification

被引：0

作者：

Bian, Tengyue ^{[1
,2
]}

Chen, Fangzhou ^{[1
,2
]}

Xu, Li ^{[1
,2
]}

机构：

[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China

[2] Zhejiang Univ, Robot Inst, Yuyao 315400, Zhejiang, Peoples R China

来源：

PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019) | 2019年

关键词：

Self-Attention; Speaker Verification; Triplet Loss;

D O I：

10.1109/ccdc.2019.8833466

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present a self-attention based model for text-independent speaker verification task and a novel variant of the triplet loss. Conventional convolutional neural networks (CNNs) used in speaker verification tasks need very deep layers to realize considerable performance. In our proposed model, the self-attention mechanism could easily capture long-range dependencies, thus achieves better representational capability with fewer parameters. Based on triplet loss, we propose a novel triplet selection method, which makes the training more efficient and achieves significant performance enhancement. Text-independent speaker verification experiments on AISHELL-2 corpus shows that the proposed model with the improved loss function decreases the verification equal error rate (EER) by 16.81% relatively compared with the state-of-the-art ResNet-like model with common triplet loss, while the proposed model has fewer parameters and requires lower computational cost.

引用

页码：3955 / 3960

页数：6

共 25 条

[1]

[Anonymous], 2017, IEEE C COMP VIS PATT

[2]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[3]

Bredin H, 2017, INT CONF ACOUST SPEE, P5430, DOI 10.1109/ICASSP.2017.7953194

[4] Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function [J].

Cheng, De ;

Gong, Yihong ;

Zhou, Sanping ;

Wang, Jinjun ;

Zheng, Nanning .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1335-1344

[5]

Dehak N., 2009, Ecole de Technologie Superieure (Canada)

[6] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[7]

Du J., 2018, abs/1808.10583

[8]

Flu Jie, 2017, ARXIV170901507, V7

[9]

Garcia-Romero D., 2011, P INT 2011, P249, DOI DOI 10.21437/INTERSPEECH.2011-53

[10]

He K., 2016, IEEE C COMPUT VIS PA, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38, DOI 10.1109/CVPR.2016.90]

← 1 2 3 →