Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification

被引:3
|
作者
Hong, Qian-Bei [1 ,2 ]
Wu, Chung-Hsien [3 ]
Wang, Hsin-Min [4 ]
机构
[1] Natl Cheng Kung Univ, Grad Program Multimedia Syst & Intelligent Comp, Tainan, Taiwan
[2] Acad Sinica, Tainan, Taiwan
[3] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
[4] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
关键词
Speaker verification; parent embedding learning; partial adaptive score normalization; RECOGNITION; EMBEDDINGS;
D O I
10.1109/TASLP.2022.3221042
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The ability to generalize to mismatches between training and testing conditions and resist interference from other speakers is crucial for the performance of speaker verification. In this paper, we propose two novel approaches to improve the generalization ability to deal with the mismatched recorded scenarios and languages in test conditions and to reduce the influence of interference from other speakers on the similarity measurement of two speaker embeddings. First, parent embedding learning (PEL) is used for model training, which exploits the generalization ability of the shared structure to improve the representation of speaker embeddings. Second, partial adaptive score normalization (PAS-Norm) is used to reduce the influence of interference from other speakers on embedding-based similarity measures. In the experiments, the speaker embedding models are trained using the VoxCeleb2 dataset, and the performance is evaluated on four other datasets under different conditions, including VoxCeleb1, Librispeech, SITW, and CN-Celeb datasets. In the experiments on VoxCeleb1, evaluation results considering a large number of verification speakers and identity restrictions show that the proposed PEL-based system reduces the EER by 6.0% and 4.9% in these two cases, respectively, compared to the state-of-the-art (SOTA) system. Furthermore, in the experiments evaluating speaker verification in mismatch conditions on SITW and CN-Celeb, the proposed PEL-based system also outperforms the SOTA system. In the language mismatched conditions, the EER is reduced by 8.3%. For the evaluation of the influence of interference from other speakers, the EER is significantly reduced by 24.4% when PAS-Norm is used instead of the baseline AS-Norm score normalization method.
引用
收藏
页码:486 / 499
页数:14
相关论文
共 50 条
  • [41] Subband Analysis for Performance Improvement of Replay Attack Detection in Speaker Verification Systems
    Garg, Sachin
    Bhilare, Shruti
    Kanhangad, Vivek
    2019 5TH IEEE INTERNATIONAL CONFERENCE ON IDENTITY, SECURITY, AND BEHAVIOR ANALYSIS (ISBA 2019), 2019,
  • [42] MIM-DG: Mutual information minimization-based domain generalization for speaker verification
    Kang, Woo Hyun
    Alam, Jahangir
    Fathan, Abderrahim
    INTERSPEECH 2022, 2022, : 3674 - 3678
  • [43] TELEPHONY TEXT-PROMPTED SPEAKER VERIFICATION USING I-VECTOR REPRESENTATION
    Zeinali, Hossein
    Kalantari, Elaheh
    Sameti, Hossein
    Hadian, Hossein
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4839 - 4843
  • [44] Integrative evaluation model of node vulnerability considering network transmission ability and anti-interference ability
    Lei, Cheng
    Liu, Junyong
    Wei, Zhenbo
    Liu, Youbo
    Gao, Yiwen
    Gou, Jing
    Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2014, 34 (07): : 144 - 149
  • [45] On Joint Optimization of Automatic Speaker Verification and Anti-Spoofing in the Embedding Space
    Gomez-Alanis, Alejandro
    Gonzalez-Lopez, Jose A.
    Pavankumar Dubagunta, S.
    Peinado, Antonio M.
    Magimai-Doss, Mathew
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 1579 - 1593
  • [46] Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification
    Khoury, Elie
    Kinnunen, Tomi
    Sizov, Aleksandr
    Wu, Zhizheng
    Marcel, Sebastien
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 61 - 65
  • [48] Linear prediction residual features for automatic speaker verification anti-spoofing
    Hanilci, Cemal
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (13) : 16099 - 16111
  • [49] Increasing anti-spoofing protection in speaker verification using linear prediction
    Artur Janicki
    Multimedia Tools and Applications, 2017, 76 : 9017 - 9032
  • [50] Linear prediction residual features for automatic speaker verification anti-spoofing
    Cemal Hanilçi
    Multimedia Tools and Applications, 2018, 77 : 16099 - 16111