Adversarial Domain Adaptation for Speaker Verification using Partially Shared Network

被引:13
作者
Chen, Zhengyang [1 ]
Wang, Shuai [1 ]
Qian, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, SpeechLab,Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
INTERSPEECH 2020 | 2020年
关键词
Adversarial Training; Domain Adaption; Partially Shared Weights; Speaker Verification;
D O I
10.21437/Interspeech.2020-2226
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speaker verification systems usually suffer from large performance degradation when applied to a new dataset from a different domain. In this work, we will study the domain adaption strategy between datasets with different languages using domain adversarial training. We introduce a partially shared network based domain adversarial training architecture to learn an asymmetric mapping for source and target domain embedding extractor. This architecture can help the embedding extractor learn domain invariant feature without sacrificing the ability on speaker discrimination. When doing the evaluation on cross-lingual domain adaption, the source domain data is in English from NIST SRE04-10 and Switchboard, and the target domain data is in Cantonese and Tagalog from NIST SRE16. Our results show that the usual adversarial training mode will indeed harm the speaker discrimination when the source and target domain embedding extractors are fully shared, and in contrast the newly proposed architecture solves this problem and achieves similar to 25.0% relative average Equal Error Rate (EER) improvement on SRE16 Cantonese and Tagalog evaluation.
引用
收藏
页码:3017 / 3021
页数:5
相关论文
共 24 条
[1]  
Agarap, 2018, ARXIV PREPRINT ARXIV, DOI [10.48550/ARXIV.1803.08375, DOI 10.48550/ARXIV.1803.08375]
[2]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[3]  
Bhattacharya G, 2019, INT CONF ACOUST SPEE, P6226, DOI [10.1109/ICASSP.2019.8682064, 10.1109/icassp.2019.8682064]
[4]  
Chen ZY, 2020, INT CONF ACOUST SPEE, P6574, DOI [10.1109/ICASSP40776.2020.9053905, 10.1109/icassp40776.2020.9053905]
[5]   ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].
Deng, Jiankang ;
Guo, Jia ;
Xue, Niannan ;
Zafeiriou, Stefanos .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694
[6]  
Ganin Y, 2016, J MACH LEARN RES, V17
[7]   Angular Softmax for Short-Duration Text-independent Speaker Verification [J].
Huang, Zili ;
Wang, Shuai ;
Yu, Kai .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3623-3627
[8]  
Gulrajani I, 2017, ADV NEUR IN, V30
[9]   Deep feature for text-dependent speaker verification [J].
Liu, Yuan ;
Qian, Yanmin ;
Chen, Nanxin ;
Fu, Tianfan ;
Zhang, Ya ;
Yu, Kai .
SPEECH COMMUNICATION, 2015, 73 :1-13
[10]  
Povey D., 2011, IEEE 2011 WORKSH AUT, P1