Self-Attention Encoding and Pooling for Speaker Recognition

被引：35

作者：

Safari, Pooyan ^{[1
]}

India, Miquel ^{[1
]}

Hernando, Javier ^{[1
]}

机构：

[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain

来源：

INTERSPEECH 2020 | 2020年

关键词：

Self-Attention Encoding; Self-Attention Pooling; Speaker Verification; Speaker Embedding;

D O I：

10.21437/Interspeech.2020-1446

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

The computing power of mobile devices limits the end-user applications in terms of storage size, processing, memory and energy consumption. These limitations motivate researchers for the design of more efficient deep models. On the other hand, self-attention networks based on Transformer architecture have attracted remarkable interests due to their high parallelization capabilities and strong performance on a variety of Natural Language Processing (NLP) applications. Inspired by the Transformer, we propose a tandem Self-Attention Encoding and Pooling (SAEP) mechanism to obtain a discriminative speaker embedding given non-fixed length speech utterances. SAEP is a stack of identical blocks solely relied on self-attention and position-wise feed-forward networks to create vector representation of speakers. This approach encodes short-term speaker spectral features into speaker embeddings to be used in text-independent speaker verification. We have evaluated this approach on both VoxCeleb1 & 2 datasets. The proposed architecture is able to outperform the baseline x-vector, and shows competitive performance to some other benchmarks based on convolutions, with a significant reduction in model size. It employs 94%, 95%, and 73% less parameters compared to ResNet-34, ResNet-50, and x-vector, respectively. This indicates that the proposed fully attention based architecture is more efficient in extracting time-invariant features from speaker utterances.

引用

页码：941 / 945

页数：5

共 50 条

[21] Cyclic Self-attention for Point Cloud Recognition
Zhu, Guanyu
Zhou, Yong
Yao, Rui
Zhu, Hancheng
Zhao, Jiaqi
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
[22] Named Entity Recognition in Persian Language based on Self-attention Mechanism with Weighted Relational Position Encoding
Ganjalipour, Ebrahim
Sheikhani, Amir Hossein Refahi
Kordrostami, Sohrab
Hosseinzadeh, Ali Asghar
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
[23] Global-Local Self-Attention Based Transformer for Speaker Verification
Xie, Fei
Zhang, Dalong
Liu, Chengming
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[24] A Window-Based Self-Attention approach for sentence encoding
Huang, Ting
Deng, Zhi-Hong
Shen, Gehui
Chen, Xi
NEUROCOMPUTING, 2020, 375 : 25 - 31
[25] SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition
Xia, Yan
Xu, Yusheng
Li, Shuang
Wang, Rui
Du, Juan
Cremers, Daniel
Stilla, Uwe
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11343 - 11352
[26] Self-attention transfer networks for speech emotion recognition
Ziping ZHAO
Keru Wang
Zhongtian BAO
Zixing ZHANG
Nicholas CUMMINS
Shihuang SUN
Haishuai WANG
Jianhua TAO
Bj?rn W.SCHULLER
虚拟现实与智能硬件(中英文), 2021, 3 (01) : 43 - 54
[27] Multilingual Speech Recognition with Self-Attention Structured Parameterization
Zhu, Yun
Haghani, Parisa
Tripathi, Anshuman
Ramabhadran, Bhuvana
Farris, Brian
Xu, Hainan
Lu, Han
Sak, Hasim
Leal, Isabel
Gaur, Neeraj
Moreno, Pedro J.
Zhang, Qian
INTERSPEECH 2020, 2020, : 4741 - 4745
[28] ON THE USEFULNESS OF SELF-ATTENTION FOR AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMERS
Zhang, Shucong
Loweimi, Erfan
Bell, Peter
Renals, Steve
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 89 - 96
[29] A visual self-attention network for facial expression recognition
Yu, Naigong
Bai, Deguo
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[30] Polarimetric HRRP Recognition Based on ConvLSTM With Self-Attention
Zhang, Liang
Li, Yang
Wang, Yanhua
Wang, Junfu
Long, Teng
IEEE SENSORS JOURNAL, 2021, 21 (06) : 7884 - 7898

← 1 2 3 4 5 →