Dynamic Convolution With Global-Local Information for Session-Invariant Speaker Representation Learning

被引:7
作者
Gu, Bin [1 ]
Guo, Wu [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230036, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolution; Kernel; NIST; Training; Training data; Time-frequency analysis; Neural networks; Speaker verification; dynamic convolution; mismatch problem; acoustic variability; RECOGNITION;
D O I
10.1109/LSP.2021.3136141
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Various mismatchedconditions result in performance degradation of the speaker verification (SV) systems. To address this issue, we extract robust speaker representations by devising a global-local information-based dynamic convolution neural network. In the proposed method, both global and local information of the input features are exploited to dynamically modify the convolution kernel values. This increases the model capability of capturing speaker characteristics by compensating both the inter- and intra-session variabilities. Extensive experiments on four publicly available SV datasets show significant and consistent improvements over the conventional approaches. The effectiveness of the proposed method is further investigated using ablation studies and visualizations.
引用
收藏
页码:404 / 408
页数:5
相关论文
共 31 条
[1]  
[Anonymous], 2018, NIST 2018 SPEAK REC
[2]  
[Anonymous], 2011, IEEE WORKSHOP AUTOMA
[3]  
Cai DW, 2020, INT CONF ACOUST SPEE, P6469, DOI [10.1109/ICASSP40776.2020.9053407, 10.1109/icassp40776.2020.9053407]
[4]  
Chen CP, 2019, INT CONF ACOUST SPEE, P6211, DOI [10.1109/icassp.2019.8683185, 10.1109/ICASSP.2019.8683185]
[5]  
Fang X, 2019, INT CONF ACOUST SPEE, P6221, DOI [10.1109/ICASSP.2019.8682327, 10.1109/icassp.2019.8682327]
[6]   An Improved Deep Embedding Learning Method for Short Duration Speaker Verification [J].
Gao, Zhifu ;
Song, Yan ;
McLoughlin, Ian ;
Guo, Wu ;
Dai, Lirong .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3578-3582
[7]  
Gu B., 2020, INTERSPEECH
[8]   Speaker Recognition by Machines and Humans [J].
Hansen, John H. L. ;
Hasan, Taufiq .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) :74-99
[9]  
Huang CL, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P291, DOI [10.1109/asru46091.2019.9003938, 10.1109/ASRU46091.2019.9003938]
[10]   Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances [J].
Jung, Youngmoon ;
Kye, Seong Min ;
Choi, Yeunju ;
Jung, Myunghun ;
Kim, Hoirin .
INTERSPEECH 2020, 2020, :1501-1505