Speaker Recognition Using e-Vectors

被引:18
作者
Cumani, Sandro [1 ]
Laface, Pietro [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10143 Turin, Italy
关键词
Speaker recognition; eigenvoice; joint factor analysis; i-vectors; e-vectors; PLDA; TRANSFORMATIONS; VARIABILITY;
D O I
10.1109/TASLP.2018.2791806
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Systems based on i-vectors represent the current state-of-the-art in text-independent speaker recognition. Unlike joint factor analysis (JFA), which models both speaker and intersession subspaces separately, in the i-vector approach all the important variability is modeled in a single low-dimensional sub-space. This paper is based on the observation that JFA estimates a more informative speaker subspace than the "total variability" i-vector subspace, because the latter is obtained by considering each training segment as belonging to a different speaker. We propose a speaker modeling approach that extracts a compact representation of a speech segment, similar to the speaker factors of JFA and to i-vectors, referred to as "e-vector." Estimating the e-vector subspace follows a procedure similar to i-vector training, but produces a more accurate speaker subspace, as confirmed by the results of a set of tests performed on the NIST 2012 and 2010 Speaker Recognition Evaluations. Simply replacing the i-vectors with e-vectors we get approximately 10% average improvement of the C-primary cost function, using different systems and classifiers. It is worth noting that these performance gains come without any additional memory or computational costs with respect to the standard i-vector systems.
引用
收藏
页码:736 / 748
页数:13
相关论文
共 41 条
[1]  
[Anonymous], NIST YEAR 2008 2010
[2]  
[Anonymous], 2010, P OD 2010 SPEAK LANG
[3]  
[Anonymous], FAREWELL SVM BAYES F
[4]  
[Anonymous], 2011, INTERSPEECH
[5]  
[Anonymous], P EUROSPEECH 03
[6]  
[Anonymous], 2010, TECH REP
[7]  
[Anonymous], DIGITAL SIGNAL PROCE
[8]  
[Anonymous], 2014, Odyssey
[9]  
[Anonymous], 2000, INTERSPEECH
[10]  
Brümmer N, 2010, ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, P194