SPOOFING-AWARE SPEAKER VERIFICATION ROBUST AGAINST DOMAIN AND CHANNEL MISMATCHES

被引:0
作者
Chang, Zeng [1 ,2 ]
Miao, Xiaoxiao [3 ]
Wang, Xin [1 ]
Cooper, Erica [1 ]
Yamagishi, Junichi [1 ,2 ]
机构
[1] Natl Inst Informat, Tokyo, Japan
[2] SOKENDAI, Hayama, Kanagawa, Japan
[3] Singapore Inst Technol, Singapore, Singapore
来源
2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2024年
关键词
Speaker verification; robustness; multi-task learning; meta-learning; SPEECH; PLDA;
D O I
10.1109/SLT61566.2024.10832246
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In real-world applications, it is challenging to build a speaker verification system that is simultaneously robust against common threats, including spoofing attacks, channel mismatch, and domain mismatch. Traditional automatic speaker verification (ASV) systems often tackle these issues separately, leading to suboptimal performance when faced with simultaneous challenges. In this paper, we propose an integrated framework that incorporates pair-wise learning and spoofing attack simulation into the meta-learning paradigm to enhance robustness against these multifaceted threats. This novel approach employs an asymmetric dual-path model and a multi-task learning strategy to handle ASV, anti-spoofing, and spoofing-aware ASV tasks concurrently. A new testing dataset, CNComplex, is introduced to evaluate system performance under these combined threats. Experimental results demonstrate that our integrated model significantly improves performance over traditional ASV systems across various scenarios, showcasing its potential for real-world deployment. Additionally, the proposed framework's ability to generalize across different conditions highlights its robustness and reliability, making it a promising solution for practical ASV applications.
引用
收藏
页码:1150 / 1157
页数:8
相关论文
共 49 条
  • [1] In defence of metric learning for speaker recognition
    Chung, Joon Son
    Huh, Jaesung
    Mun, Seongkyu
    Lee, Minjae
    Heo, Hee-Soo
    Choe, Soyeon
    Ham, Chiheon
    Jung, Sunghwan
    Lee, Bong-Jin
    Han, Icksang
    [J]. INTERSPEECH 2020, 2020, : 2977 - 2981
  • [2] Chung JS, 2018, INTERSPEECH, P1086
  • [3] Joint Estimation of PLDA and Nonlinear Transformations of Speaker Vectors
    Cumani, Sandro
    Laface, Pietro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) : 1890 - 1900
  • [4] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [5] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
    Desplanques, Brecht
    Thienpondt, Jenthe
    Demuynck, Kris
    [J]. INTERSPEECH 2020, 2020, : 3830 - 3834
  • [6] Fan Y, 2020, INT CONF ACOUST SPEE, P7604, DOI [10.1109/icassp40776.2020.9054017, 10.1109/ICASSP40776.2020.9054017]
  • [7] Franceschi L, 2018, PR MACH LEARN RES, V80
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] Holmes Wendy J, 1989, EUROSPEECH, P2513
  • [10] Ioffe S, 2006, LECT NOTES COMPUT SC, V3954, P531