Self-attention based speaker recognition using Cluster-Range Loss

被引:17
|
作者
Bian, Tengyue [1 ]
Chen, Fangzhou [1 ]
Xu, Li [1 ]
机构
[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
关键词
Self-attention; Speaker recognition; Triplet loss; VERIFICATION; MACHINES;
D O I
10.1016/j.neucom.2019.08.046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 50 条
  • [21] Neural Named Entity Recognition Using a Self-Attention Mechanism
    Zukov-Gregoric, Andrej
    Bachrach, Yoram
    Minkovsky, Pasha
    Coope, Sam
    Maksak, Bogdan
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 652 - 656
  • [22] An Aerial Target Recognition Algorithm Based on Self-Attention and LSTM
    Liang, Futai
    Chen, Xin
    He, Song
    Song, Zihao
    Lu, Hao
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (01): : 1101 - 1121
  • [23] Pedestrian Attribute Recognition Based on Dual Self-attention Mechanism
    Fan, Zhongkui
    Guan, Ye-peng
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (02) : 793 - 812
  • [24] Self-Attention based Siamese Neural Network recognition Model
    Liu, Yuxing
    Chang, Geng
    Fu, Guofeng
    Wei, Yingchao
    Lan, Jie
    Liu, Jiarui
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 721 - 724
  • [25] Lightweight Smoke Recognition Based on Deep Convolution and Self-Attention
    Zhao, Yang
    Wang, Yigang
    Jung, Hoi-Kyung
    Jin, Yongqiang
    Hua, Dan
    Xu, Sen
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [26] The Multimodal Scene Recognition Method Based on Self-Attention and Distillation
    Sun, Ning
    Xu, Wei
    Liu, Jixin
    Chai, Lei
    Sun, Haian
    IEEE MULTIMEDIA, 2024, 31 (04) : 25 - 36
  • [27] A Self-attention Based Model for Offline Handwritten Text Recognition
    Nam Tuan Ly
    Trung Tan Ngo
    Nakagawa, Masaki
    PATTERN RECOGNITION, ACPR 2021, PT II, 2022, 13189 : 356 - 369
  • [28] Long-Tailed Recognition Based on Self-attention Mechanism
    Feng, Zekai
    Jia, Hong
    Li, Mengke
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 380 - 391
  • [29] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [30] Self-Attention Networks for Text-Independent Speaker Verification
    Bian, Tengyue
    Chen, Fangzhou
    Xu, Li
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3955 - 3960