DEEP NEURAL NETWORK DRIVEN MIXTURE OF PLDA FOR ROBUST I-VECTOR SPEAKER VERIFICATION

被引:0
作者
Li, Na [1 ]
Mak, Man-Wai [1 ]
Chien, Jen-Tzung [2 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
来源
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016) | 2016年
关键词
Speaker verification; i-vector; mixture of PLDA; deep neural networks; SNR mismatch;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In speaker recognition, the mismatch between the enrollment and test utterances due to noise with different signal-to-noise ratios (SNRs) is a great challenge. Based on the observation that noise-level variability causes the i-vectors to form heterogeneous clusters, this paper proposes using an SNR-aware deep neural network (DNN) to guide the training of PLDA mixture models. Specifically, given an i-vector, the SNR posterior probabilities produced by the DNN are used as the posteriors of indicator variables of the mixture model. As a result, the proposed model provides a more reasonable soft division of the i-vector space compared to the conventional mixture of PLDA. During verification, given a test trial, the marginal likelihoods from individual PLDA models are linearly combined by the posterior probabilities of SNR levels computed by the DNN. Experimental results for SNR mismatch tasks based on NIST 2012 SRE suggest that the proposed model is more effective than PLDA and conventional mixture of PLDA for handling heterogeneous corpora.
引用
收藏
页码:186 / 191
页数:6
相关论文
共 30 条
[1]  
[Anonymous], 2012, NIST YEAR 2012 SPEAK
[2]  
[Anonymous], P INT
[3]  
[Anonymous], 1996, TECH REP
[4]  
[Anonymous], P 2016 IEEE INT C AC
[5]  
[Anonymous], 2011, INTERSPEECH
[6]   EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312
[7]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[8]  
Dehak N, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P1527
[9]  
Erhan D, 2010, J MACH LEARN RES, V11, P625
[10]  
Garcia-Romero D, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1141