Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification

被引:3
|
作者
Hong, Qian-Bei [1 ,2 ]
Wu, Chung-Hsien [3 ]
Wang, Hsin-Min [4 ]
机构
[1] Natl Cheng Kung Univ, Grad Program Multimedia Syst & Intelligent Comp, Tainan, Taiwan
[2] Acad Sinica, Tainan, Taiwan
[3] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
[4] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
关键词
Speaker verification; parent embedding learning; partial adaptive score normalization; RECOGNITION; EMBEDDINGS;
D O I
10.1109/TASLP.2022.3221042
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The ability to generalize to mismatches between training and testing conditions and resist interference from other speakers is crucial for the performance of speaker verification. In this paper, we propose two novel approaches to improve the generalization ability to deal with the mismatched recorded scenarios and languages in test conditions and to reduce the influence of interference from other speakers on the similarity measurement of two speaker embeddings. First, parent embedding learning (PEL) is used for model training, which exploits the generalization ability of the shared structure to improve the representation of speaker embeddings. Second, partial adaptive score normalization (PAS-Norm) is used to reduce the influence of interference from other speakers on embedding-based similarity measures. In the experiments, the speaker embedding models are trained using the VoxCeleb2 dataset, and the performance is evaluated on four other datasets under different conditions, including VoxCeleb1, Librispeech, SITW, and CN-Celeb datasets. In the experiments on VoxCeleb1, evaluation results considering a large number of verification speakers and identity restrictions show that the proposed PEL-based system reduces the EER by 6.0% and 4.9% in these two cases, respectively, compared to the state-of-the-art (SOTA) system. Furthermore, in the experiments evaluating speaker verification in mismatch conditions on SITW and CN-Celeb, the proposed PEL-based system also outperforms the SOTA system. In the language mismatched conditions, the EER is reduced by 8.3%. For the evaluation of the influence of interference from other speakers, the EER is significantly reduced by 24.4% when PAS-Norm is used instead of the baseline AS-Norm score normalization method.
引用
收藏
页码:486 / 499
页数:14
相关论文
共 50 条
  • [1] Improvement of Speaker Vector-Based Speaker Verification
    Tadokoro, Naoki
    Kosaka, Tetsuo
    Kato, Masaharu
    Kohda, Masaki
    FIFTH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 721 - 724
  • [2] DEEP SPEAKER REPRESENTATION USING ORTHOGONAL DECOMPOSITION AND RECOMBINATION FOR SPEAKER VERIFICATION
    Kim, Insoo
    Kim, Kyuhong
    Kim, Jiwhan
    Choi, Changkyu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6126 - 6130
  • [3] PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification
    Zheng, Siqi
    Suo, Hongbin
    Chen, Qian
    INTERSPEECH 2022, 2022, : 1431 - 1435
  • [4] SPEAKER VERIFICATION USING SPARSE REPRESENTATION CLASSIFICATION
    Kua, Jia Min Karen
    Ambikairajah, Eliathamby
    Epps, Julien
    Togneri, Roberto
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4548 - 4551
  • [5] Comparison of Two Kinds of Speaker Location Representation for SVM-based Speaker Verification
    Zhao, Xianyu
    Dong, Yuan
    Yang, Hao
    Zhao, Jian
    Lu, Liang
    Wang, Haila
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1001 - +
  • [6] Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-Supervised Speaker Verification
    Mun, Sung Hwan
    Han, Min Hyun
    Lee, Dongjune
    Kim, Jihwan
    Kim, Nam Soo
    IEEE ACCESS, 2021, 9 : 167615 - 167627
  • [7] Frame level sparse representation classification for speaker verification
    Hasheminejad, Mohammad
    Farsi, Hassan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (20) : 21211 - 21224
  • [8] Ellipsoid representation of reference templates for efficient speaker verification
    Tan, KC
    Neo, SS
    Yeo, BHC
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 302 - 311
  • [9] Meta-Generalization for Domain-Invariant Speaker Verification
    Zhang, Hanyi
    Wang, Longbiao
    Lee, Kong Aik
    Liu, Meng
    Dang, Jianwu
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1024 - 1036
  • [10] Frame level sparse representation classification for speaker verification
    Mohammad Hasheminejad
    Hassan Farsi
    Multimedia Tools and Applications, 2017, 76 : 21211 - 21224