Fusion of SNR-Dependent PLDA Models for Noise Robust Speaker Verification

被引:0
|
作者
Pang, Xiaomin [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
来源
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年
关键词
Speaker verification; i-vectors; probabilistic LDA; NIST; 2012; SRE; noise robustness; ACOUSTIC FACTOR-ANALYSIS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The i-vector representation and probabilistic linear discriminant analysis (PLDA) have shown state-of-the-art performance in many speaker verification systems. However, in real-world environments, additive and convolutive noise cause mismatches between training and recognition conditions, degrading the performance. In this paper, a fusion system that combines a multi-condition PLDA model and a mixture of SNR-dependent PLDA models is proposed to make the verification system noise robust. The SNR of test utterances is used to determine the best SNR-dependent PLDA model to score against the target-speaker's i-vectors. The performance of the fusion system is demonstrated on NIST 2012 SRE. Results show that the SNR-dependent PLDA models can reduce EER and that the fusion system is more robust than the conventional i-vector/PLDA systems under noisy conditions. It is also found that the SNR-dependent PLDA models are insensitive to Z-norm parameters.
引用
收藏
页码:619 / 623
页数:5
相关论文
共 50 条
  • [31] Statistically Significant Duration-Independent-based Noise-Robust Speaker Verification
    Nirmal, Asmita
    Jayaswal, Deepak
    Kachare, Pramod H.
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2024, 9 (01) : 147 - 162
  • [32] Incorporating pass-phrase dependent background models for text-dependent speaker verification
    Sarkar, Achintya Kumar
    Tan, Zheng-Hua
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 259 - 271
  • [33] Dual-model self-regularization and fusion for domain adaptation of robust speaker verification
    Duan, Yibo
    Long, Yanhua
    Liang, Jiaen
    SPEECH COMMUNICATION, 2023, 155
  • [34] Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification
    Yu, Hong
    Hu, Tianrui
    Ma, Zhanyu
    Tan, Zheng-Hua
    Guo, Jun
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 165 - 169
  • [35] Noise Compensation in i-vector Space Using Linear Regression for Robust Speaker Verification
    Baby, Renjith
    Kumar, C. Santhosh
    George, Kuruvachan K.
    Panda, Ashish
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON MULTIMEDIA, SIGNAL PROCESSING AND COMMUNICATION TECHNOLOGIES (IMPACT), 2017, : 161 - 165
  • [36] On the influence of metric learning loss functions for robust self-supervised speaker verification to label noise
    Fathan, Abderrahim
    Zhu, Xiaolin
    Alam, Jahangir
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1024 - 1031
  • [37] Robust Speaker Verification System in Acoustic Noise Mobile by using Multitaper Gammaton Hilbert Envelope Coefficients
    Krobba, Ahmed
    Debyeche, Mohamed
    Selouani, Sid Ahmed
    2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 9 - 14
  • [38] A real-time trained system for robust speaker verification using relative space of anchor models
    Naini, Ali Sadeghi
    Homayounpour, M. Mehdi
    Samani, Abbas
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (04): : 545 - 561
  • [39] Investigation of DNN based Feature Enhancement Jointly Trained with X-Vectors for Noise-Robust Speaker Verification
    Yang, Joon-Young
    Park, Kwan-Ho
    Chang, Joon-Hyuk
    Kim, Youngsam
    Cho, Sangrae
    2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
  • [40] On Speech Features Fusion, α-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification
    Venturini, A.
    Zao, L.
    Coelho, R.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1951 - 1964