Front-end Channel Compensation using Mixture-dependent Feature Transformations for i-Vector Speaker Recognition

被引：0

作者：

Hasan, Taufiq ^{[1
]}

Hansen, John H. L. ^{[1
]}

机构：

[1] Univ Texas Dallas, CRSS, Eric Jonsson Sch Engn, Richardson, TX 75083 USA

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

PCA; GMM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-of-the-art session variability compensation for speaker recognition are generally based on various linear statistical models of the Gaussian Mixture Model (GMM) mean super-vectors, while front-end features are only processed by standard normalization techniques. In this study, we propose a front-end channel compensation frame-work using mixture-localized linear transforms that operate before super-vector domain modeling begins. In this approach, local linear transforms are trained for each Gaussian component of a Universal Background Model (UBM), and are applied to acoustic features according to their mixture-wise probabilistic alignment, yielding an operation that is globally non-linear. We examine Principal Component Analysis (PCA), whitening, Linear Discriminant Analysis (LDA) and Nuisance Attribute Projection (NAP) as front-end feature transformations. We also propose a method, Nuisance Attribute Elimination (NAB), which is similar to NAP but performs dimensionality reduction in addition to channel compensation. We show that the proposed frame-work can be readily integrated with a standard i-Vector system by simply applying the transformations on the first order Baum-Welch statistics and transforming the UBM. Experiments performed on the telephone trials of the NIST SRE 2010 demonstrate significant performance gain from the proposed frame-work, especially using LDA as the front-end transformation.

引用

页码：1090 / 1093

页数：4

共 16 条

[1] Alam MJ, 2011, LECT NOTES ARTIF INT, V7015, P246, DOI 10.1007/978-3-642-25020-0_32
[2] [Anonymous], 2011, INTERSPEECH
[3] Analysis of feature extraction and channel compensation in a GMM speaker recognition system
Burget, Lukas
Matejka, Pavel
Schwarz, Petr
Glembek, Ondfei
Cernocky, Jan 'Honza'
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 1979 - 1986
[4] Dehak N., 2010, IEEE T AUDIO SPEECH, V19, P788
[5] Independent comparative study of PCA, ICA, and LDA on the FERET data set
Delac, K
Grgic, M
Grgic, S
[J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2005, 15 (05) : 252 - 260
[6] Eisele T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P252, DOI 10.1109/ICSLP.1996.607092
[7] GOLUB G, 1965, J SOC IND APPL MAT B, P205
[8] Jin Q., 2000, P ICSLP, P250
[9] HIERARCHICAL MIXTURES OF EXPERTS AND THE EM ALGORITHM
JORDAN, MI
JACOBS, RA
[J]. NEURAL COMPUTATION, 1994, 6 (02) : 181 - 214
[10] Eigenvoice modeling with sparse training data
Kenny, P
Boulianne, G
Dumouchel, P
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03): : 345 - 354

← 1 2 →