Self-attention based speaker recognition using Cluster-Range Loss

被引：17

作者：

Bian, Tengyue ^{[1
]}

Chen, Fangzhou ^{[1
]}

Xu, Li ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China

来源：

NEUROCOMPUTING | 2019年 / 368卷

关键词：

Self-attention; Speaker recognition; Triplet loss; VERIFICATION; MACHINES;

D O I：

10.1016/j.neucom.2019.08.046

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：59 / 68

页数：10

共 50 条

[11] Finger Vein Recognition Based on ResNet With Self-Attention
Zhang, Zhibo
Chen, Guanghua
Zhang, Weifeng
Wang, Huiyang
IEEE ACCESS, 2024, 12 : 1943 - 1951
[12] Speaker diarization with variants of self-attention and joint speaker embedding extractor
Fu, Pengbin
Ma, Yuchen
Yang, Huirong
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180
[13] Class-GE2E: Speaker Verification Using Self-Attention and Transfer Learning with Loss Combination
Bae, Ara
Kim, Wooil
ELECTRONICS, 2022, 11 (06)
[14] Speaker-Aware Speech Enhancement with Self-Attention
Lin, Ju
Van Wijngaarden, Adriaan J.
Smith, Melissa C.
Wang, Kuang-Ching
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
[15] LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
Han, Bing
Chen, Zhengyang
Qian, Yanmin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6727 - 6731
[16] LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
Han, Bing
Chen, Zhengyang
Qian, Yanmin
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 6727 - 6731
[17] Speaker Verification Employing Combinations of Self-Attention Mechanisms
Bae, Ara
Kim, Wooil
ELECTRONICS, 2020, 9 (12) : 1 - 11
[18] Self-attention for Speech Emotion Recognition
Tarantino, Lorenzo
Garner, Philip N.
Lazaridis, Alexandros
INTERSPEECH 2019, 2019, : 2578 - 2582
[19] Self Attention Networks in Speaker Recognition
Safari, Pooyan
India, Miquel
Hernando, Javier
APPLIED SCIENCES-BASEL, 2023, 13 (11):
[20] Using Self-Attention LSTMs to Enhance Observations in Goal Recognition
Amado, Leonardo
Licks, Gabriel Paludo
Marcon, Matheus
Pereira, Ramon Fraga
Meneguzzi, Felipe
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →