Self-Attention Networks for Text-Independent Speaker Verification

被引:0
作者
Bian, Tengyue [1 ,2 ]
Chen, Fangzhou [1 ,2 ]
Xu, Li [1 ,2 ]
机构
[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Robot Inst, Yuyao 315400, Zhejiang, Peoples R China
来源
PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019) | 2019年
关键词
Self-Attention; Speaker Verification; Triplet Loss;
D O I
10.1109/ccdc.2019.8833466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a self-attention based model for text-independent speaker verification task and a novel variant of the triplet loss. Conventional convolutional neural networks (CNNs) used in speaker verification tasks need very deep layers to realize considerable performance. In our proposed model, the self-attention mechanism could easily capture long-range dependencies, thus achieves better representational capability with fewer parameters. Based on triplet loss, we propose a novel triplet selection method, which makes the training more efficient and achieves significant performance enhancement. Text-independent speaker verification experiments on AISHELL-2 corpus shows that the proposed model with the improved loss function decreases the verification equal error rate (EER) by 16.81% relatively compared with the state-of-the-art ResNet-like model with common triplet loss, while the proposed model has fewer parameters and requires lower computational cost.
引用
收藏
页码:3955 / 3960
页数:6
相关论文
共 25 条
[1]  
[Anonymous], 2017, IEEE C COMP VIS PATT
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]  
Bredin H, 2017, INT CONF ACOUST SPEE, P5430, DOI 10.1109/ICASSP.2017.7953194
[4]   Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function [J].
Cheng, De ;
Gong, Yihong ;
Zhou, Sanping ;
Wang, Jinjun ;
Zheng, Nanning .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1335-1344
[5]  
Dehak N., 2009, Ecole de Technologie Superieure (Canada)
[6]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[7]  
Du J., 2018, abs/1808.10583
[8]  
Flu Jie, 2017, ARXIV170901507, V7
[9]  
Garcia-Romero D., 2011, P INT 2011, P249, DOI DOI 10.21437/INTERSPEECH.2011-53
[10]  
He K., 2016, IEEE C COMPUT VIS PA, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38, DOI 10.1109/CVPR.2016.90]