Speaker Recognition Using e-Vectors

被引：18

作者：

Cumani, Sandro ^{[1
]}

Laface, Pietro ^{[1
]}

机构：

[1] Politecn Torino, Dipartimento Automat & Informat, I-10143 Turin, Italy

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2018年 / 26卷 / 04期

关键词：

Speaker recognition; eigenvoice; joint factor analysis; i-vectors; e-vectors; PLDA; TRANSFORMATIONS; VARIABILITY;

D O I：

10.1109/TASLP.2018.2791806

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Systems based on i-vectors represent the current state-of-the-art in text-independent speaker recognition. Unlike joint factor analysis (JFA), which models both speaker and intersession subspaces separately, in the i-vector approach all the important variability is modeled in a single low-dimensional sub-space. This paper is based on the observation that JFA estimates a more informative speaker subspace than the "total variability" i-vector subspace, because the latter is obtained by considering each training segment as belonging to a different speaker. We propose a speaker modeling approach that extracts a compact representation of a speech segment, similar to the speaker factors of JFA and to i-vectors, referred to as "e-vector." Estimating the e-vector subspace follows a procedure similar to i-vector training, but produces a more accurate speaker subspace, as confirmed by the results of a set of tests performed on the NIST 2012 and 2010 Speaker Recognition Evaluations. Simply replacing the i-vectors with e-vectors we get approximately 10% average improvement of the C-primary cost function, using different systems and classifiers. It is worth noting that these performance gains come without any additional memory or computational costs with respect to the standard i-vector systems.

引用

页码：736 / 748

页数：13

共 41 条

[11]

Burget L, 2011, INT CONF ACOUST SPEE, P4832

[12]

Cumani Sandro, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P1645, DOI 10.1109/ICASSP.2014.6853877

[13] Joint Estimation of PLDA and Nonlinear Transformations of Speaker Vectors [J].

Cumani, Sandro ;

Laface, Pietro .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) :1890-1900

[14]

Cumani S, 2017, INT CONF ACOUST SPEE, P5435, DOI 10.1109/ICASSP.2017.7953195

[15] Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition [J].

Cumani, Sandro ;

Laface, Pietro .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) :908-919

[16]

Cumani S, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P200

[17] Large-Scale Training of Pairwise Support Vector Machines for Speaker Recognition [J].

Cumani, Sandro ;

Laface, Pietro .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (11) :1590-1600

[18] Pairwise Discriminative Speaker Verification in the I-Vector Space [J].

Cumani, Sandro ;

Bruemmer, Niko ;

Burget, Lukas ;

Laface, Pietro ;

Plchot, Oldrich ;

Vasilakakis, Vasileios .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06) :1217-1227

[19]

Cumani S, 2011, INT CONF ACOUST SPEE, P4852

[20]

Cumani Sandro., 2016, Proc. Odyssey, P39

← 1 2 3 4 5 →