Combining Self-supervised Learning and Adversarial Training based Domain Adaptation for Speaker Verification

被引：0

作者：

Chen, Zhengyang ^{[1
]}

Wang, Shuai ^{[2
,3
]}

Han, Bing ^{[1
]}

Qian, Yanmin ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Auditory Cognit & Computat Acoust Lab, MoE Key Lab Artificial Intelligence,AI Inst, Shanghai, Peoples R China

[2] Shenzhen Res Inst Big Data, Shenzhen, Peoples R China

[3] Chinese Univ Hong Kong, Shenzhen, Peoples R China

来源：

2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024 | 2024年

关键词：

speech verification; domain adaptation; adversarial training; self-supervised learning;

D O I：

10.1109/ISCSLP63861.2024.10800283

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Adapting an existing well-trained system to a new domain using only unlabeled data is a highly sought-after yet challenging task for speaker verification in real-world scenarios. In this paper, we study two different domain adaptation methods, the adversarial domain adaptation (ADA) and the self-supervised learning-based domain adaptation (SSDA). To facilitate the deployment of unsupervised adaptation methods in applications, we conduct a detailed analysis of the characteristics of both the ADA and SSDA adaptation strategies. Our findings indicate that the SSDA strategy's performance is highly influenced by the amount of target domain data, whereas the ADA strategy is relatively insensitive to data quantity. Furthermore, augmenting target domain data enhances SSDA system performance but diminishes ADA performance. To further enhance system performance, we explore the complementarity between ADA and SSDA. Our results demonstrate that ADA and SSDA complement each other. When both strategies are applied jointly, the best system achieves over 20.0% relative Equal Error Rate (EER) improvement on the Cnceleb evaluation set and over 35.0% relative average EER improvement on the SRE16 Cantonese and Tagalog evaluation set under domain mismatched conditions.

引用

页码：701 / 705

页数：5

共 32 条

[1]

Bhattacharya G, 2019, INT CONF ACOUST SPEE, P6226, DOI [10.1109/icassp.2019.8682064, 10.1109/ICASSP.2019.8682064]

[2] On robustness of unsupervised domain adaptation for speaker recognition [J].

Bousquet, Pierre-Michel ;

Rouvier, Mickael .

INTERSPEECH 2019, 2019, :2958-2962

[3]

Chen T, 2020, PR MACH LEARN RES, V119

[4] Adversarial Domain Adaptation for Speaker Verification using Partially Shared Network [J].

Chen, Zhengyang ;

Wang, Shuai ;

Qian, Yanmin .

INTERSPEECH 2020, 2020, :3017-3021

[5] SELF-SUPERVISED LEARNING BASED DOMAIN ADAPTATION FOR ROBUST SPEAKER VERIFICATION [J].

Chen, Zhengyang ;

Wang, Shuai ;

Qian, Yanmin .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :5834-5838

[6]

Chung J.S, 2020, WORKSH SELF SUP LEAR

[7] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].

Desplanques, Brecht ;

Thienpondt, Jenthe ;

Demuynck, Kris .

INTERSPEECH 2020, 2020, :3830-3834

[8]

Fan Y., 2019, Cn-celeb: a challenging chinese speaker recognition dataset

[9]

Ganin Y, 2016, J MACH LEARN RES, V17

[10]

Ganin Y, 2015, PR MACH LEARN RES, V37, P1180

← 1 2 3 4 →