DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION

被引:0
作者
Snyder, David [1 ]
Ghahremani, Pegah
Povey, Daniel
Garcia-Romero, Daniel
Carmiel, Yishay
Khudanpur, Sanjeev
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
来源
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016) | 2016年
关键词
speaker verification; deep neural networks; end-to-end training; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we investigate an end-to-end text-independent speaker verification system. The architecture consists of a deep neural network that takes a variable length speech segment and maps it to a speaker embedding. The objective function separates same-speaker and different-speaker pairs, and is reused during verification. Similar systems have recently shown promise for text-dependent verification, but we believe that this is unexplored for the text-independent task. We show that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates. Relative to the baseline, the end-to-end system reduces EER by 13% average and 29% pooled across test conditions. The fused system achieves a reduction of 32% average and 38% pooled.
引用
收藏
页码:165 / 170
页数:6
相关论文
共 28 条
[1]  
[Anonymous], CORR
[2]  
[Anonymous], 2011, INTERSPEECH 2011 12
[3]  
[Anonymous], P INTERSPEECH
[4]  
[Anonymous], P RLA2C ESCA WORKSH
[5]  
[Anonymous], 2011, INTERSPEECH
[6]  
[Anonymous], THESIS
[7]  
[Anonymous], INT C AC SPEECH SIGN
[8]  
[Anonymous], INTERSPEECH IN PRESS
[9]  
[Anonymous], P OD
[10]  
[Anonymous], INTERSPEECH