Self-attention based speaker recognition using Cluster-Range Loss

被引:17
|
作者
Bian, Tengyue [1 ]
Chen, Fangzhou [1 ]
Xu, Li [1 ]
机构
[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
关键词
Self-attention; Speaker recognition; Triplet loss; VERIFICATION; MACHINES;
D O I
10.1016/j.neucom.2019.08.046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 50 条
  • [11] Finger Vein Recognition Based on ResNet With Self-Attention
    Zhang, Zhibo
    Chen, Guanghua
    Zhang, Weifeng
    Wang, Huiyang
    IEEE ACCESS, 2024, 12 : 1943 - 1951
  • [12] Speaker diarization with variants of self-attention and joint speaker embedding extractor
    Fu, Pengbin
    Ma, Yuchen
    Yang, Huirong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180
  • [13] Class-GE2E: Speaker Verification Using Self-Attention and Transfer Learning with Loss Combination
    Bae, Ara
    Kim, Wooil
    ELECTRONICS, 2022, 11 (06)
  • [14] Speaker-Aware Speech Enhancement with Self-Attention
    Lin, Ju
    Van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
  • [15] LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
    Han, Bing
    Chen, Zhengyang
    Qian, Yanmin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6727 - 6731
  • [16] LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
    Han, Bing
    Chen, Zhengyang
    Qian, Yanmin
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 6727 - 6731
  • [17] Speaker Verification Employing Combinations of Self-Attention Mechanisms
    Bae, Ara
    Kim, Wooil
    ELECTRONICS, 2020, 9 (12) : 1 - 11
  • [18] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [19] Self Attention Networks in Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    APPLIED SCIENCES-BASEL, 2023, 13 (11):
  • [20] Using Self-Attention LSTMs to Enhance Observations in Goal Recognition
    Amado, Leonardo
    Licks, Gabriel Paludo
    Marcon, Matheus
    Pereira, Ramon Fraga
    Meneguzzi, Felipe
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,