MODELLING SPEAKER AND CHANNEL VARIABILITY USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION

被引:0
作者
Bhattacharya, Gautam [1 ,2 ]
Alam, Jahangir [1 ]
Kenny, Patrick [1 ]
Gupta, Vishwa [1 ]
机构
[1] Comp Res Inst Montreal, Montreal, PQ, Canada
[2] McGill Univ, Montreal, PQ, Canada
来源
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016) | 2016年
关键词
i-vectors; deep neural networks; speaker verification; PLDA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose to improve the performance of i-vector based speaker verification by processing the i-vectors with a deep neural network before they are fed to a cosine distance or probabilistic linear discriminant analysis (PLDA) classifier. To this end we build on an existing model that we refer to as Non-linear Within Class Normalization (NWCN) and introduce a novel Speaker Classifier Network (SCN). Both models deliver impressive speaker verification performance, showing a 56% and 68% relative improvement over standard i-vectors when combined with a cosine distance backend. The NWCN model also reduces the equal error rate for PLDA from 1.78% to 1.63%. We also test these models under the constraints of domain mismatch, i.e. when no in-domain training data is available. Under these conditions, SCN features in combination with cosine distance performs better than the PLDA baseline, achieving an equal error rate of 2.92% as compared to 3.37%.
引用
收藏
页码:192 / 198
页数:7
相关论文
共 18 条
[1]  
[Anonymous], 2006, INTERSPEECH
[2]  
[Anonymous], 2016, Deep learning
[3]  
[Anonymous], 2014, Odyssey
[4]  
Bengio Yoshua, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P437, DOI 10.1007/978-3-642-35289-8_26
[5]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[6]  
DEHAK N, 2010, ODYSSEY, P15
[7]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[8]  
Dehak N, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P1527
[9]  
Garcia-Romero Daniel, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4047, DOI 10.1109/ICASSP.2014.6854362
[10]  
Garcia-Romero D, 2014, IEEE W SP LANG TECH, P378, DOI 10.1109/SLT.2014.7078604