CentriForce: Multiple-Domain Adaptation for Domain-Invariant Speaker Representation Learning

被引：3

作者：

Wei, Yuheng ^{[1
]}

Du, Junzhao ^{[1
]}

Liu, Hui ^{[1
]}

Zhang, Zhipeng ^{[1
]}

机构：

[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Shaanxi, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Training; Speaker recognition; Mathematical models; Adaptation models; Speech recognition; Representation learning; Task analysis; Multiple speech sources; multiple-domain adaptation; speaker embedding; speaker recognition; RECOGNITION;

D O I：

10.1109/LSP.2022.3154237

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the real world, speaker recognition systems usually suffer from serious performance degradation due to the domain mismatch between training and test conditions. To alleviate the harmful effect of domain shift, unsupervised domain adaptation methods are introduced to learn domain-invariant speaker representations, which focus on addressing the single-source-to-single-target domain adaptation issue. However, labeled speaker data are usually collected from multiple sources, such as different languages, genres and devices. The single-domain adaptation methods can not deal with the complex multiple-domain mismatch problem. To address this issue, we propose a multiple-domain adaptation framework named CentriForce to extract domain-invariant speaker representations for speaker recognition. Different from previous methods, CentriForce learns multiple domain-related speaker representation spaces. To mitigate the multiple-domain mismatch, CentriForce reduces the Wasserstein distance between each pair of source and target domains in their domain-related representation space and meanwhile uses the target domain as an anchor point to draw all source domains closer to each other. In our experiments, CentriForce achieves the best performance on most of the 16 challenging adaptation tasks, compared with other competing adaptation methods. Ablation study and representation visualization further demonstrate its effectiveness for learning the domain-invariant speaker embedding.

引用

页码：807 / 811

页数：5

共 50 条

[1] Knowledge Distillation-Based Domain-Invariant Representation Learning for Domain Generalization
Niu, Ziwei
Yuan, Junkun
Ma, Xu
Xu, Yingying
Liu, Jing
Chen, Yen-Wei
Tong, Ruofeng
Lin, Lanfen
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 245 - 255
[2] Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning
Kang, Jiawen
Liu, Ruiqi
Li, Lantian
Cai, Yunqi
Wang, Dong
Zheng, Thomas Fang
INTERSPEECH 2020, 2020, : 3825 - 3829
[3] Meta-Generalization for Domain-Invariant Speaker Verification
Zhang, Hanyi
Wang, Longbiao
Lee, Kong Aik
Liu, Meng
Dang, Jianwu
Meng, Helen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1024 - 1036
[4] Contrastive Adversarial Domain Adaptation Networks for Speaker Recognition
Li, Longxin
Mak, Man-Wai
Chien, Jen-Tzung
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2236 - 2245
[5] Domain-Invariant Label Propagation With Adaptive Graph Regularization
Zhang, Yanning
Tao, Jianwen
Yan, Liangda
IEEE ACCESS, 2024, 12 : 190728 - 190745
[6] Domain-Invariant Prototypes for Semantic Segmentation
Yang, Zhengeng
Yu, Hongshan
Sun, Wei
Cheng, Li
Mian, Ajmal
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7614 - 7627
[7] Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM
Kishida, Takuya
Tsukamoto, Shin
Nakashika, Toru
INTERSPEECH 2020, 2020, : 3431 - 3435
[8] Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition
Wang, Zhenyu
Hansen, John H. L.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 60 - 75
[9] Learning Domain-Invariant Discriminative Features for Heterogeneous Face Recognition
Yang, Shanmin
Fu, Keren
Yang, Xiao
Lin, Ye
Zhang, Jianwei
Peng, Cheng
IEEE ACCESS, 2020, 8 : 209790 - 209801
[10] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
Lu, Cheng
Zong, Yuan
Zheng, Wenming
Li, Yang
Tang, Chuangao
Schuller, Bjoern W.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230

← 1 2 3 4 5 →