Bayesian Estimation of PLDA in the Presence of Noisy Training Labels, With Applications to Speaker Verification

被引:4
作者
Borgstrom, Bengt J. [1 ]
机构
[1] MIT, Lincoln Lab, Lexington, MA 02420 USA
关键词
Noise measurement; Estimation; Training; Labeling; Data models; Adaptation models; Bayes methods; Speaker verification; probabilistic linear discriminant analysis; noisy labels; variational bayes;
D O I
10.1109/TASLP.2021.3130980
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paperpresents a Bayesian framework for estimating a Probabilistic Linear Discriminant Analysis (PLDA) model in the presence of noisy labels. True class labels are interpreted as latent random variables, which are transmitted through a noisy channel, and received as observed speaker labels. The labeling process is modeled as a Discrete Memoryless Channel (DMC). PLDA hyperparameters are interpreted as random variables, and their joint posterior distribution is derived using mean-field Variational Bayes, allowing maximum a posteriori (MAP) estimates of the PLDA model parameters to be determined. The proposed solution, referred to as VB-MAP, is presented as a general framework, but is studied in the context of speaker verification, and a variety of use cases are discussed. Specifically, VB-MAP can be used for PLDA estimation with unreliable labels, unsupervised PLDA estimation, and to infer the reliability of a PLDA training set. Experimental results show the proposed approach to provide significant performance improvements on a variety of NIST Speaker Recognition Evaluation (SRE) tasks, both for data sets with simulated mislabels, and for data sets with naturally occurring missing or unreliable labels.
引用
收藏
页码:414 / 428
页数:15
相关论文
共 15 条
  • [1] BAYESIAN ESTIMATION OF PLDA WITH NOISY TRAINING LABELS, WITH APPLICATIONS TO SPEAKER VERIFICATION
    Borgstrom, Bengt J.
    Torres-Carrasquillo, Pedro
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7594 - 7598
  • [2] Robust Training for Speaker Verification against Noisy Labels
    Fang, Zhihua
    He, Liang
    Ma, Hanhan
    Guo, Xiaochen
    Li, Lin
    [J]. INTERSPEECH 2023, 2023, : 3192 - 3196
  • [3] Local Training in Speaker Verification for PLDA
    Pahuja, Hunny
    Ranjan, Priya
    Ujlayan, Amit
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 1466 - 1469
  • [4] CONSTRAINED DISCRIMINATIVE PLDA TRAINING FOR SPEAKER VERIFICATION
    Rohdin, Johan
    Biswas, Sangeeta
    Shinoda, Koichi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] MULTI-OBJECTIVE OPTIMIZATION TRAINING OF PLDA FOR SPEAKER VERIFICATION
    He, Liang
    Chen, Xianhong
    Xu, Can
    Liu, Jia
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6026 - 6030
  • [6] Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification
    Wang, Qiongqiong
    Koshinaka, Takafumi
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3727 - 3731
  • [7] A Bayesian approach to the verification problem: Applications to speaker verification
    Jiang, H
    Deng, L
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08): : 874 - 884
  • [8] Improving Speaker Verification With Noise-Aware Label Ensembling and Sample Selection: Learning and Correcting Noisy Speaker Labels
    Fang, Zhihua
    He, Liang
    Li, Lin
    Hu, Ying
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2988 - 3001
  • [9] Robust discriminative training against data insufficiency in PLDA-based speaker verification
    Rohdin, Johan
    Biswas, Sangeeta
    Shinoda, Koichi
    [J]. COMPUTER SPEECH AND LANGUAGE, 2016, 35 : 32 - 57
  • [10] DEEP NEURAL NETWORK BASED DISCRIMINATIVE TRAINING FOR I-VECTOR/PLDA SPEAKER VERIFICATION
    Zheng Tieran
    Han Jiqing
    Zheng Guibin
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5354 - 5358