Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition

被引：19

作者：

Cumani, Sandro ^{[1
]}

Laface, Pietro ^{[1
]}

机构：

[1] Politecn Torino, Dipartimento Automat & Informat, I-10143 Turin, Italy

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 04期

关键词：

Density function transformation; i-vectors; probabilistic linear discriminant analysis; speaker recognition;

D O I：

10.1109/TASLP.2017.2674966

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes to estimate parametric nonlinear transformations of i-vectors for speaker recognition systems based on probabilistic linear discriminant analysis (PLDA) classification. The Gaussian PLDA model assumes that the i-vectors are distributed according to the standard normal distribution. However, it has been shown that the i-vectors are better modeled, for example, by Heavy-Tailed distributions, and that significant improvement of the classification performance can be obtained by whitening and length normalizing the i-vectors. In this paper, we propose to transform the i-vectors so that their distribution becomes more suitable to discriminate speakers using the PLDA model. This is performed by means of a sequence of affine and nonlinear transformations whose parameters are obtained by maximum likelihood estimation on the development set. Another contribution of this paper is the reduction of the mismatch between the development and evaluation i-vector length distributions by means of a scaling factor tuned for the estimated i-vector distribution, rather than by means of a blind length normalization. Relative improvement between 7% and 14% of the detection cost function was obtained with the proposed technique on the NIST SRE-2010 and SRE-2012 evaluation datasets, using both the traditional GMM/UBM and the hybrid DNN/GMM-based systems.

引用

页码：908 / 919

页数：12

共 27 条

[1]

[Anonymous], 2010, P OD 2010 SPEAK LANG

[2]

[Anonymous], 1970, IMA J APPL MATH, DOI DOI 10.1093/IMAMAT/6.3.222

[3]

[Anonymous], 2012, Proc. The Speaker and Language Recognition Workshop

[4]

[Anonymous], FAREWELL SVM BAYES F

[5]

[Anonymous], 1995, Intermediate statistics and econometrics: A comparative approach

[6]

[Anonymous], 2011, INTERSPEECH

[7]

[Anonymous], 2012, NIST YEAR 2012 SPEAK

[8]

[Anonymous], DIGITAL SIGNAL PROCE

[9]

[Anonymous], INTERSPEECH 2011 12

[10]

[Anonymous], 2014, Odyssey

← 1 2 3 →