Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification

被引:1
|
作者
Lu, Xugang [1 ]
Shen, Peng [1 ]
Tsao, Yu [2 ]
Kawai, Hisashi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Adv Speech Translat Res & Dev Promot Ctr, Kyoto 6190288, Japan
[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 115, Taiwan
关键词
Feature extraction; Data models; Task analysis; Measurement; Training; Solid modeling; Neural networks; Discriminative model; generative model; joint Bayesian model; speaker verification; RECOGNITION; MACHINES;
D O I
10.1109/TASLP.2021.3129360
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The task of speaker verification (SV) is to decide whether an utterance is spoken by a target or an imposter speaker. In most studies of SV, a log-likelihood ratio (LLR) score is estimated based on a generative probability model on speaker features, and compared with a threshold for making a decision. However, the generative model usually focuses on individual feature distributions, does not have the discriminative feature selection ability, and is easy to be distracted by nuisance features. The SV, as a hypothesis test, could be formulated as a binary discrimination task where neural network based discriminative learning could be applied. In discriminative learning, the nuisance features could be removed with the help of label supervision. However, discriminative learning pays more attention to classification boundaries, and is prone to overfitting to a training set which may result in bad generalization on a test set. In this paper, we propose a hybrid learning framework, i.e., coupling a joint Bayesian (JB) generative model structure and parameters with a neural discriminative learning framework for SV. In the hybrid framework, a two-branch Siamese neural network is built with dense layers that are coupled with factorized affine transforms as used in the JB model. The LLR score estimation in the JB model is formulated according to the distance metric in the discriminative learning framework. By initializing the two-branch neural network with the generatively learned model parameters of the JB model, we further train the model parameters with the pairwise samples as a binary discrimination task. Moreover, a direct evaluation metric (DEM) in SV based on minimum empirical Bayes risk (EBR) is designed and integrated as an objective function in the discriminative learning. We carried out SV experiments on Speakers in the wild (SITW) and Voxceleb. Experimental results showed that our proposed model improved the performance with a large margin compared with state of the art models for SV.
引用
收藏
页码:3631 / 3641
页数:11
相关论文
共 50 条
  • [21] Machine learning: Discriminative and generative
    Marina Meila
    The Mathematical Intelligencer, 2006, 28 (1) : 67 - 69
  • [22] A generative-discriminative learning model for noisy information fusion
    Hecht, Thomas
    Gepperth, Alexander
    5TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND ON EPIGENETIC ROBOTICS (ICDL-EPIROB), 2015, : 242 - 247
  • [23] Learning A Joint Discriminative-Generative Model for Action Recognition
    Alexiou, Ioannis
    Xiang, Tao
    Gong, Shaogang
    2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 1 - 4
  • [24] Combination of Cepstral and Phonetically Discriminative Features for Speaker Verification
    Sarkar, Achintya K.
    Cong-Thanh Do
    Le, Viet-Bac
    Barras, Claude
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1040 - 1044
  • [25] A Discriminative Method for Speaker Verification Using the Difference Information
    Lei, Zhenchun
    Yang, Yingchun
    Wu, Zhaohui
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 497 - 500
  • [26] DISCRIMINATIVE MULTI-DOMAIN PLDA FOR SPEAKER VERIFICATION
    Sholokhov, Alexey
    Kinnunen, Tomi
    Cumani, Sandro
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5030 - 5034
  • [27] Deep Discriminative Embeddings for Duration Robust Speaker Verification
    Li, Na
    Tuo, Deyi
    Su, Dan
    Li, Zhifeng
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2262 - 2266
  • [28] A DISCRIMINATIVE CONDITION-AWARE BACKEND FOR SPEAKER VERIFICATION
    Ferrer, Luciana
    McLaren, Mitchell
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6604 - 6608
  • [29] Comparison of Generative and Discriminative Approaches for Speaker Recognition with Limited Data
    Silovsky, Jan
    Cerva, Petr
    Zdansky, Jindrich
    RADIOENGINEERING, 2009, 18 (03) : 307 - 316
  • [30] Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification
    Wang, Shuai
    Huang, Zili
    Qian, Yanmin
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1686 - 1696