Self-attention based speaker recognition using Cluster-Range Loss

被引：17

作者：

Bian, Tengyue ^{[1
]}

Chen, Fangzhou ^{[1
]}

Xu, Li ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China

来源：

NEUROCOMPUTING | 2019年 / 368卷

关键词：

Self-attention; Speaker recognition; Triplet loss; VERIFICATION; MACHINES;

D O I：

10.1016/j.neucom.2019.08.046

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：59 / 68

页数：10

共 50 条

[41] An efficient self-attention network for skeleton-based action recognition
Qin, Xiaofei
Cai, Rui
Yu, Jiabin
He, Changxiang
Zhang, Xuedian
SCIENTIFIC REPORTS, 2022, 12 (01):
[42] Human Activity Recognition Based on Self-Attention Mechanism in WiFi Environment
Ge, Fei
Yang, Zhimin
Dai, Zhenyang
Tan, Liansheng
Hu, Jianyuan
Li, Jiayuan
Qiu, Han
IEEE ACCESS, 2024, 12 : 85231 - 85243
[43] Ghost imaging object recognition based on self-attention mechanism network
He, Yunting
Yuan, Sheng
Song, Jiali
AIP ADVANCES, 2023, 13 (12)
[44] Self-Attention Networks For Motion Posture Recognition Based On Data Fusion
Ji, Zhihao
Xie, Qiang
4TH INTERNATIONAL CONFERENCE ON INFORMATICS ENGINEERING AND INFORMATION SCIENCE (ICIEIS2021), 2022, 12161
[45] Global Positional Self-Attention for Skeleton-Based Action Recognition
Kim, Jaehwan
Lee, Junsuk
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3355 - 3361
[46] Speaker Cluster based GMM Tokenization for Speaker Recognition
Ma, Bin
Zhu, Donglai
Tong, Rong
Li, Haizhou
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 505 - 508
[47] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
Paiva, Pedro V. V.
Ramos, Josue J. G.
Gavrilova, Marina
Carvalho, Marco A. G.
COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
[48] Self-attention transfer networks for speech emotion recognition
Ziping ZHAO
Keru Wang
Zhongtian BAO
Zixing ZHANG
Nicholas CUMMINS
Shihuang SUN
Haishuai WANG
Jianhua TAO
Bj?rn W.SCHULLER
虚拟现实与智能硬件(中英文), 2021, 3 (01) : 43 - 54
[49] Multilingual Speech Recognition with Self-Attention Structured Parameterization
Zhu, Yun
Haghani, Parisa
Tripathi, Anshuman
Ramabhadran, Bhuvana
Farris, Brian
Xu, Hainan
Lu, Han
Sak, Hasim
Leal, Isabel
Gaur, Neeraj
Moreno, Pedro J.
Zhang, Qian
INTERSPEECH 2020, 2020, : 4741 - 4745
[50] ON THE USEFULNESS OF SELF-ATTENTION FOR AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMERS
Zhang, Shucong
Loweimi, Erfan
Bell, Peter
Renals, Steve
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 89 - 96

← 1 2 3 4 5 →