JOINT I-VECTOR WITH END-TO-END SYSTEM FOR SHORT DURATION TEXT-INDEPENDENT SPEAKER VERIFICATION

被引：0

作者：

Huang, Zili ^{[1
]}

Wang, Shuai ^{[1
]}

Qian, Yanmin ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Brain Sci & Technol Res Ctr, Key Lab Shanghai Educ Commiss Intelligent Interac, Speech Lab,Dept Comp Sci & Engn, Shanghai, Peoples R China

[2] Tencent, Tencent AI Lab, Bellevue, WA 98004 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

speaker verification; end-to-end; i-vector; triplet loss; hard trial selection; EMBEDDINGS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Factor analysis based i-vector has been the state-of-the-art method for speaker verification. Recently, researchers propose to build DNN based end-to-end speaker verification systems and achieve comparable performance with i-vector. Since these two methods possess their own property and differ from each other significantly, we explore a framework to integrate these two paradigms together to utilize their complementarity. More specifically, in this paper we develop and compare four methodologies to integrate traditional i-vector into end-to-end systems, including score fusion, embeddings concatenation, transformed concatenation and joint learning. All these approaches achieve significant gains. Moreover, the hard trial selection is performed on the end-to-end architecture which further improves the performance. Experimental results on a text-independent short-duration dataset generated from SRE 2010 reveal that the newly proposed method reduces the EER by relative 31.0% and 28.2% compared to the i-vector and end-to-end baselines respectively.

引用

页码：4869 / 4873

页数：5

共 24 条

[1]

[Anonymous], 2015, ARXIV150400923

[2]

Bredin H, 2017, INT CONF ACOUST SPEE, P5430, DOI 10.1109/ICASSP.2017.7953194

[3] Speaker recognition: A tutorial [J].

Campbell, JP .

PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462

[4]

Chen N., 2015, 16 ANN C INT SPEECH

[5] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].

Dahl, George E. ;

Yu, Dong ;

Deng, Li ;

Acero, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42

[6] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[7]

Fu TF, 2014, INTERSPEECH, P1327

[8]

Heigold G, 2016, INT CONF ACOUST SPEE, P5115, DOI 10.1109/ICASSP.2016.7472652

[9] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

[10]

Li LT, 2015, ASIAPAC SIGN INFO PR, P426, DOI 10.1109/APSIPA.2015.7415306

← 1 2 3 →