A Speaker Recognition Method Based on Dynamic Convolution with Dual Attention Mechanism

被引:0
作者
Luo, Yuan [1 ]
Zhu, Kuilin [1 ]
Wang, Wenhao [1 ]
Lin, Ziyao [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Sch Optoelect Engn, Chongqing 400065, Peoples R China
关键词
Speaker Recognition; Deep Learning; Attention Mechanism; Dynamic Convolution;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Deep neural networks have gained significant attention in text-independent speaker recognition tasks. However, due to the fixed parameters of traditional static convolutional neural networks, they cannot flexibly capture the variation in phonemes that are integral to speech sentences. To address this limitation, this paper proposes a channel-space attention-based dynamic convolutional speaker recognition method. This method employs dual-attention mechanisms to generate dynamic convolutional kernels, which improves the capture of phoneme variation information between different inputs in the speech signal. We conducted experiments using the TIMIT dataset to evaluate the proposed method's effectiveness in various network frameworks. Our results show that the best performance can be achieved when dynamic convolution is generated using four static convolutional kernels. Specifically, in the ResNet-34 framework, the Equal Error Rate (EER%) of the proposed method is improved by 31.1% over the static convolutional method CNN and by 20.3% over the single-attention dynamic convolutional method (DynamicConv). Additionally, the performance of the proposed method is enhanced in all other network frameworks. These findings demonstrate the effectiveness of the proposed method and the importance of considering phoneme variations in speaker recognition systems.
引用
收藏
页码:825 / 832
页数:1
相关论文
共 37 条
[1]   The history of linear prediction [J].
Atal, BS .
IEEE SIGNAL PROCESSING MAGAZINE, 2006, 23 (02) :154-+
[2]   Support vector machines using GMM supervectors for speaker verification [J].
Campbell, WM ;
Sturim, DE ;
Reynolds, DA .
IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311
[3]   Adaptive Convolution for Object Detection [J].
Chen, Chunlin ;
Ling, Qiang .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (12) :3205-3217
[4]   Dynamic Convolution: Attention over Convolution Kernels [J].
Chen, Yinpeng ;
Dai, Xiyang ;
Liu, Mengchen ;
Chen, Dongdong ;
Yuan, Lu ;
Liu, Zicheng .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11027-11036
[5]  
Choi BJ, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P2475
[6]  
Dehak N, 2010, ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, P71
[7]  
EATOCK JP, 1994, INT CONF ACOUST SPEE, P133
[8]  
Garofolo J.S., 1993, NASA STI/Recon technical report n, 93:27403, V93, P27403
[9]  
Gu B, 2020, Arxiv, DOI arXiv:2002.06049
[10]  
Hu J., 2018, PROC IEEE C COMPUT V, P7132