Self-Attention Encoding and Pooling for Speaker Recognition

被引:35
|
作者
Safari, Pooyan [1 ]
India, Miquel [1 ]
Hernando, Javier [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
来源
关键词
Self-Attention Encoding; Self-Attention Pooling; Speaker Verification; Speaker Embedding;
D O I
10.21437/Interspeech.2020-1446
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The computing power of mobile devices limits the end-user applications in terms of storage size, processing, memory and energy consumption. These limitations motivate researchers for the design of more efficient deep models. On the other hand, self-attention networks based on Transformer architecture have attracted remarkable interests due to their high parallelization capabilities and strong performance on a variety of Natural Language Processing (NLP) applications. Inspired by the Transformer, we propose a tandem Self-Attention Encoding and Pooling (SAEP) mechanism to obtain a discriminative speaker embedding given non-fixed length speech utterances. SAEP is a stack of identical blocks solely relied on self-attention and position-wise feed-forward networks to create vector representation of speakers. This approach encodes short-term speaker spectral features into speaker embeddings to be used in text-independent speaker verification. We have evaluated this approach on both VoxCeleb1 & 2 datasets. The proposed architecture is able to outperform the baseline x-vector, and shows competitive performance to some other benchmarks based on convolutions, with a significant reduction in model size. It employs 94%, 95%, and 73% less parameters compared to ResNet-34, ResNet-50, and x-vector, respectively. This indicates that the proposed fully attention based architecture is more efficient in extracting time-invariant features from speaker utterances.
引用
收藏
页码:941 / 945
页数:5
相关论文
共 50 条
  • [21] Cyclic Self-attention for Point Cloud Recognition
    Zhu, Guanyu
    Zhou, Yong
    Yao, Rui
    Zhu, Hancheng
    Zhao, Jiaqi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [22] Named Entity Recognition in Persian Language based on Self-attention Mechanism with Weighted Relational Position Encoding
    Ganjalipour, Ebrahim
    Sheikhani, Amir Hossein Refahi
    Kordrostami, Sohrab
    Hosseinzadeh, Ali Asghar
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
  • [23] Global-Local Self-Attention Based Transformer for Speaker Verification
    Xie, Fei
    Zhang, Dalong
    Liu, Chengming
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [24] A Window-Based Self-Attention approach for sentence encoding
    Huang, Ting
    Deng, Zhi-Hong
    Shen, Gehui
    Chen, Xi
    NEUROCOMPUTING, 2020, 375 : 25 - 31
  • [25] SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition
    Xia, Yan
    Xu, Yusheng
    Li, Shuang
    Wang, Rui
    Du, Juan
    Cremers, Daniel
    Stilla, Uwe
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11343 - 11352
  • [26] Self-attention transfer networks for speech emotion recognition
    Ziping ZHAO
    Keru Wang
    Zhongtian BAO
    Zixing ZHANG
    Nicholas CUMMINS
    Shihuang SUN
    Haishuai WANG
    Jianhua TAO
    Bj?rn W.SCHULLER
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 43 - 54
  • [27] Multilingual Speech Recognition with Self-Attention Structured Parameterization
    Zhu, Yun
    Haghani, Parisa
    Tripathi, Anshuman
    Ramabhadran, Bhuvana
    Farris, Brian
    Xu, Hainan
    Lu, Han
    Sak, Hasim
    Leal, Isabel
    Gaur, Neeraj
    Moreno, Pedro J.
    Zhang, Qian
    INTERSPEECH 2020, 2020, : 4741 - 4745
  • [28] ON THE USEFULNESS OF SELF-ATTENTION FOR AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMERS
    Zhang, Shucong
    Loweimi, Erfan
    Bell, Peter
    Renals, Steve
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 89 - 96
  • [29] A visual self-attention network for facial expression recognition
    Yu, Naigong
    Bai, Deguo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [30] Polarimetric HRRP Recognition Based on ConvLSTM With Self-Attention
    Zhang, Liang
    Li, Yang
    Wang, Yanhua
    Wang, Junfu
    Long, Teng
    IEEE SENSORS JOURNAL, 2021, 21 (06) : 7884 - 7898