Self-attention based speaker recognition using Cluster-Range Loss

被引:17
|
作者
Bian, Tengyue [1 ]
Chen, Fangzhou [1 ]
Xu, Li [1 ]
机构
[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
关键词
Self-attention; Speaker recognition; Triplet loss; VERIFICATION; MACHINES;
D O I
10.1016/j.neucom.2019.08.046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 50 条
  • [41] An efficient self-attention network for skeleton-based action recognition
    Qin, Xiaofei
    Cai, Rui
    Yu, Jiabin
    He, Changxiang
    Zhang, Xuedian
    SCIENTIFIC REPORTS, 2022, 12 (01):
  • [42] Human Activity Recognition Based on Self-Attention Mechanism in WiFi Environment
    Ge, Fei
    Yang, Zhimin
    Dai, Zhenyang
    Tan, Liansheng
    Hu, Jianyuan
    Li, Jiayuan
    Qiu, Han
    IEEE ACCESS, 2024, 12 : 85231 - 85243
  • [43] Ghost imaging object recognition based on self-attention mechanism network
    He, Yunting
    Yuan, Sheng
    Song, Jiali
    AIP ADVANCES, 2023, 13 (12)
  • [44] Self-Attention Networks For Motion Posture Recognition Based On Data Fusion
    Ji, Zhihao
    Xie, Qiang
    4TH INTERNATIONAL CONFERENCE ON INFORMATICS ENGINEERING AND INFORMATION SCIENCE (ICIEIS2021), 2022, 12161
  • [45] Global Positional Self-Attention for Skeleton-Based Action Recognition
    Kim, Jaehwan
    Lee, Junsuk
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3355 - 3361
  • [46] Speaker Cluster based GMM Tokenization for Speaker Recognition
    Ma, Bin
    Zhu, Donglai
    Tong, Rong
    Li, Haizhou
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 505 - 508
  • [47] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
    Paiva, Pedro V. V.
    Ramos, Josue J. G.
    Gavrilova, Marina
    Carvalho, Marco A. G.
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
  • [48] Self-attention transfer networks for speech emotion recognition
    Ziping ZHAO
    Keru Wang
    Zhongtian BAO
    Zixing ZHANG
    Nicholas CUMMINS
    Shihuang SUN
    Haishuai WANG
    Jianhua TAO
    Bj?rn W.SCHULLER
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 43 - 54
  • [49] Multilingual Speech Recognition with Self-Attention Structured Parameterization
    Zhu, Yun
    Haghani, Parisa
    Tripathi, Anshuman
    Ramabhadran, Bhuvana
    Farris, Brian
    Xu, Hainan
    Lu, Han
    Sak, Hasim
    Leal, Isabel
    Gaur, Neeraj
    Moreno, Pedro J.
    Zhang, Qian
    INTERSPEECH 2020, 2020, : 4741 - 4745
  • [50] ON THE USEFULNESS OF SELF-ATTENTION FOR AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMERS
    Zhang, Shucong
    Loweimi, Erfan
    Bell, Peter
    Renals, Steve
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 89 - 96