SPOOFING-AWARE SPEAKER VERIFICATION ROBUST AGAINST DOMAIN AND CHANNEL MISMATCHES

被引：0

作者：

Chang, Zeng ^{[1
,2
]}

Miao, Xiaoxiao ^{[3
]}

Wang, Xin ^{[1
]}

Cooper, Erica ^{[1
]}

Yamagishi, Junichi ^{[1
,2
]}

机构：

[1] Natl Inst Informat, Tokyo, Japan

[2] SOKENDAI, Hayama, Kanagawa, Japan

[3] Singapore Inst Technol, Singapore, Singapore

来源：

2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2024年

关键词：

Speaker verification; robustness; multi-task learning; meta-learning; SPEECH; PLDA;

D O I：

10.1109/SLT61566.2024.10832246

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In real-world applications, it is challenging to build a speaker verification system that is simultaneously robust against common threats, including spoofing attacks, channel mismatch, and domain mismatch. Traditional automatic speaker verification (ASV) systems often tackle these issues separately, leading to suboptimal performance when faced with simultaneous challenges. In this paper, we propose an integrated framework that incorporates pair-wise learning and spoofing attack simulation into the meta-learning paradigm to enhance robustness against these multifaceted threats. This novel approach employs an asymmetric dual-path model and a multi-task learning strategy to handle ASV, anti-spoofing, and spoofing-aware ASV tasks concurrently. A new testing dataset, CNComplex, is introduced to evaluate system performance under these combined threats. Experimental results demonstrate that our integrated model significantly improves performance over traditional ASV systems across various scenarios, showcasing its potential for real-world deployment. Additionally, the proposed framework's ability to generalize across different conditions highlights its robustness and reliability, making it a promising solution for practical ASV applications.

引用

页码：1150 / 1157

页数：8

共 49 条

[1] In defence of metric learning for speaker recognition
Chung, Joon Son
Huh, Jaesung
Mun, Seongkyu
Lee, Minjae
Heo, Hee-Soo
Choe, Soyeon
Ham, Chiheon
Jung, Sunghwan
Lee, Bong-Jin
Han, Icksang
[J]. INTERSPEECH 2020, 2020, : 2977 - 2981
[2] Chung JS, 2018, INTERSPEECH, P1086
[3] Joint Estimation of PLDA and Nonlinear Transformations of Speaker Vectors
Cumani, Sandro
Laface, Pietro
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) : 1890 - 1900
[4] Front-End Factor Analysis for Speaker Verification
Dehak, Najim
Kenny, Patrick J.
Dehak, Reda
Dumouchel, Pierre
Ouellet, Pierre
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
[5] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Desplanques, Brecht
Thienpondt, Jenthe
Demuynck, Kris
[J]. INTERSPEECH 2020, 2020, : 3830 - 3834
[6] Fan Y, 2020, INT CONF ACOUST SPEE, P7604, DOI [10.1109/icassp40776.2020.9054017, 10.1109/ICASSP40776.2020.9054017]
[7] Franceschi L, 2018, PR MACH LEARN RES, V80
[8] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[9] Holmes Wendy J, 1989, EUROSPEECH, P2513
[10] Ioffe S, 2006, LECT NOTES COMPUT SC, V3954, P531

← 1 2 3 4 5 →