CentriForce: Multiple-Domain Adaptation for Domain-Invariant Speaker Representation Learning

被引:3
|
作者
Wei, Yuheng [1 ]
Du, Junzhao [1 ]
Liu, Hui [1 ]
Zhang, Zhipeng [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Training; Speaker recognition; Mathematical models; Adaptation models; Speech recognition; Representation learning; Task analysis; Multiple speech sources; multiple-domain adaptation; speaker embedding; speaker recognition; RECOGNITION;
D O I
10.1109/LSP.2022.3154237
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the real world, speaker recognition systems usually suffer from serious performance degradation due to the domain mismatch between training and test conditions. To alleviate the harmful effect of domain shift, unsupervised domain adaptation methods are introduced to learn domain-invariant speaker representations, which focus on addressing the single-source-to-single-target domain adaptation issue. However, labeled speaker data are usually collected from multiple sources, such as different languages, genres and devices. The single-domain adaptation methods can not deal with the complex multiple-domain mismatch problem. To address this issue, we propose a multiple-domain adaptation framework named CentriForce to extract domain-invariant speaker representations for speaker recognition. Different from previous methods, CentriForce learns multiple domain-related speaker representation spaces. To mitigate the multiple-domain mismatch, CentriForce reduces the Wasserstein distance between each pair of source and target domains in their domain-related representation space and meanwhile uses the target domain as an anchor point to draw all source domains closer to each other. In our experiments, CentriForce achieves the best performance on most of the 16 challenging adaptation tasks, compared with other competing adaptation methods. Ablation study and representation visualization further demonstrate its effectiveness for learning the domain-invariant speaker embedding.
引用
收藏
页码:807 / 811
页数:5
相关论文
共 50 条
  • [1] Knowledge Distillation-Based Domain-Invariant Representation Learning for Domain Generalization
    Niu, Ziwei
    Yuan, Junkun
    Ma, Xu
    Xu, Yingying
    Liu, Jing
    Chen, Yen-Wei
    Tong, Ruofeng
    Lin, Lanfen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 245 - 255
  • [2] Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning
    Kang, Jiawen
    Liu, Ruiqi
    Li, Lantian
    Cai, Yunqi
    Wang, Dong
    Zheng, Thomas Fang
    INTERSPEECH 2020, 2020, : 3825 - 3829
  • [3] Meta-Generalization for Domain-Invariant Speaker Verification
    Zhang, Hanyi
    Wang, Longbiao
    Lee, Kong Aik
    Liu, Meng
    Dang, Jianwu
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1024 - 1036
  • [4] Contrastive Adversarial Domain Adaptation Networks for Speaker Recognition
    Li, Longxin
    Mak, Man-Wai
    Chien, Jen-Tzung
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2236 - 2245
  • [5] Domain-Invariant Label Propagation With Adaptive Graph Regularization
    Zhang, Yanning
    Tao, Jianwen
    Yan, Liangda
    IEEE ACCESS, 2024, 12 : 190728 - 190745
  • [6] Domain-Invariant Prototypes for Semantic Segmentation
    Yang, Zhengeng
    Yu, Hongshan
    Sun, Wei
    Cheng, Li
    Mian, Ajmal
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7614 - 7627
  • [7] Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM
    Kishida, Takuya
    Tsukamoto, Shin
    Nakashika, Toru
    INTERSPEECH 2020, 2020, : 3431 - 3435
  • [8] Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition
    Wang, Zhenyu
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 60 - 75
  • [9] Learning Domain-Invariant Discriminative Features for Heterogeneous Face Recognition
    Yang, Shanmin
    Fu, Keren
    Yang, Xiao
    Lin, Ye
    Zhang, Jianwei
    Peng, Cheng
    IEEE ACCESS, 2020, 8 : 209790 - 209801
  • [10] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Zheng, Wenming
    Li, Yang
    Tang, Chuangao
    Schuller, Bjoern W.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230