JOINT I-VECTOR WITH END-TO-END SYSTEM FOR SHORT DURATION TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:0
作者
Huang, Zili [1 ]
Wang, Shuai [1 ]
Qian, Yanmin [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Brain Sci & Technol Res Ctr, Key Lab Shanghai Educ Commiss Intelligent Interac, Speech Lab,Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Tencent, Tencent AI Lab, Bellevue, WA 98004 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
speaker verification; end-to-end; i-vector; triplet loss; hard trial selection; EMBEDDINGS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Factor analysis based i-vector has been the state-of-the-art method for speaker verification. Recently, researchers propose to build DNN based end-to-end speaker verification systems and achieve comparable performance with i-vector. Since these two methods possess their own property and differ from each other significantly, we explore a framework to integrate these two paradigms together to utilize their complementarity. More specifically, in this paper we develop and compare four methodologies to integrate traditional i-vector into end-to-end systems, including score fusion, embeddings concatenation, transformed concatenation and joint learning. All these approaches achieve significant gains. Moreover, the hard trial selection is performed on the end-to-end architecture which further improves the performance. Experimental results on a text-independent short-duration dataset generated from SRE 2010 reveal that the newly proposed method reduces the EER by relative 31.0% and 28.2% compared to the i-vector and end-to-end baselines respectively.
引用
收藏
页码:4869 / 4873
页数:5
相关论文
共 24 条
[1]  
[Anonymous], 2015, ARXIV150400923
[2]  
Bredin H, 2017, INT CONF ACOUST SPEE, P5430, DOI 10.1109/ICASSP.2017.7953194
[3]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[4]  
Chen N., 2015, 16 ANN C INT SPEECH
[5]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[6]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[7]  
Fu TF, 2014, INTERSPEECH, P1327
[8]  
Heigold G, 2016, INT CONF ACOUST SPEE, P5115, DOI 10.1109/ICASSP.2016.7472652
[9]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[10]  
Li LT, 2015, ASIAPAC SIGN INFO PR, P426, DOI 10.1109/APSIPA.2015.7415306