Dynamic Convolution With Global-Local Information for Session-Invariant Speaker Representation Learning

被引:6
作者
Gu, Bin [1 ]
Guo, Wu [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230036, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolution; Kernel; NIST; Training; Training data; Time-frequency analysis; Neural networks; Speaker verification; dynamic convolution; mismatch problem; acoustic variability; RECOGNITION;
D O I
10.1109/LSP.2021.3136141
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Various mismatchedconditions result in performance degradation of the speaker verification (SV) systems. To address this issue, we extract robust speaker representations by devising a global-local information-based dynamic convolution neural network. In the proposed method, both global and local information of the input features are exploited to dynamically modify the convolution kernel values. This increases the model capability of capturing speaker characteristics by compensating both the inter- and intra-session variabilities. Extensive experiments on four publicly available SV datasets show significant and consistent improvements over the conventional approaches. The effectiveness of the proposed method is further investigated using ablation studies and visualizations.
引用
收藏
页码:404 / 408
页数:5
相关论文
empty
未找到相关数据