Dynamic Convolution With Global-Local Information for Session-Invariant Speaker Representation Learning

被引:6
|
作者
Gu, Bin [1 ]
Guo, Wu [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230036, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolution; Kernel; NIST; Training; Training data; Time-frequency analysis; Neural networks; Speaker verification; dynamic convolution; mismatch problem; acoustic variability; RECOGNITION;
D O I
10.1109/LSP.2021.3136141
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Various mismatchedconditions result in performance degradation of the speaker verification (SV) systems. To address this issue, we extract robust speaker representations by devising a global-local information-based dynamic convolution neural network. In the proposed method, both global and local information of the input features are exploited to dynamically modify the convolution kernel values. This increases the model capability of capturing speaker characteristics by compensating both the inter- and intra-session variabilities. Extensive experiments on four publicly available SV datasets show significant and consistent improvements over the conventional approaches. The effectiveness of the proposed method is further investigated using ablation studies and visualizations.
引用
收藏
页码:404 / 408
页数:5
相关论文
共 50 条
  • [1] The global-local transformation for invariant shape representation
    Raftopoulos, Konstantinos A.
    Kollias, Stefanos D.
    ADVANCES IN VISUAL COMPUTING, PROCEEDINGS, PT 2, 2007, 4842 : 224 - 233
  • [2] A Dynamic Convolution Framework for Session-Independent Speaker Embedding Learning
    Gu, Bin
    Zhang, Jie
    Guo, Wu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3647 - 3658
  • [3] Global-Local Self-Distillation for Visual Representation Learning
    Lebailly, Tim
    Tuytelaars, Tinne
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1441 - 1450
  • [4] GLPose: Global-Local Representation Learning for Human Pose Estimation
    Jiao, Yingying
    Chen, Haipeng
    Feng, Runyang
    Chen, Haoming
    Wu, Sifan
    Yin, Yifang
    Liu, Zhenguang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [5] Robust facial expression recognition with global-local joint representation learning
    Chunxiao Fan
    Zhenxing Wang
    Jia Li
    Shanshan Wang
    Xiao Sun
    Multimedia Systems, 2023, 29 : 3069 - 3079
  • [6] Robust facial expression recognition with global-local joint representation learning
    Fan, Chunxiao
    Wang, Zhenxing
    Li, Jia
    Wang, Shanshan
    Sun, Xiao
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 3069 - 3079
  • [7] Global, local representation and speaker perks
    Lorsbach, Beth
    Chemical and Engineering News, 2024, 102 (16):
  • [8] Video Captioning Using Global-Local Representation
    Yan, Liqi
    Ma, Siqi
    Wang, Qifan
    Chen, Yingjie
    Zhang, Xiangyu
    Savakis, Andreas
    Liu, Dongfang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6642 - 6656
  • [9] Global-local contrastive multiview representation learning for skeleton-based action
    Bian, Cunling
    Feng, Wei
    Meng, Fanbo
    Wang, Song
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
  • [10] Video representation learning for temporal action detection using global-local attention
    Tang, Yiping
    Zheng, Yang
    Wei, Chen
    Guo, Kaitai
    Hu, Haihong
    Liang, Jimin
    PATTERN RECOGNITION, 2022, 134