Front-end Channel Compensation using Mixture-dependent Feature Transformations for i-Vector Speaker Recognition

被引:0
作者
Hasan, Taufiq [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, CRSS, Eric Jonsson Sch Engn, Richardson, TX 75083 USA
来源
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年
关键词
PCA; GMM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art session variability compensation for speaker recognition are generally based on various linear statistical models of the Gaussian Mixture Model (GMM) mean super-vectors, while front-end features are only processed by standard normalization techniques. In this study, we propose a front-end channel compensation frame-work using mixture-localized linear transforms that operate before super-vector domain modeling begins. In this approach, local linear transforms are trained for each Gaussian component of a Universal Background Model (UBM), and are applied to acoustic features according to their mixture-wise probabilistic alignment, yielding an operation that is globally non-linear. We examine Principal Component Analysis (PCA), whitening, Linear Discriminant Analysis (LDA) and Nuisance Attribute Projection (NAP) as front-end feature transformations. We also propose a method, Nuisance Attribute Elimination (NAB), which is similar to NAP but performs dimensionality reduction in addition to channel compensation. We show that the proposed frame-work can be readily integrated with a standard i-Vector system by simply applying the transformations on the first order Baum-Welch statistics and transforming the UBM. Experiments performed on the telephone trials of the NIST SRE 2010 demonstrate significant performance gain from the proposed frame-work, especially using LDA as the front-end transformation.
引用
收藏
页码:1090 / 1093
页数:4
相关论文
共 16 条
  • [1] Alam MJ, 2011, LECT NOTES ARTIF INT, V7015, P246, DOI 10.1007/978-3-642-25020-0_32
  • [2] [Anonymous], 2011, INTERSPEECH
  • [3] Analysis of feature extraction and channel compensation in a GMM speaker recognition system
    Burget, Lukas
    Matejka, Pavel
    Schwarz, Petr
    Glembek, Ondfei
    Cernocky, Jan 'Honza'
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 1979 - 1986
  • [4] Dehak N., 2010, IEEE T AUDIO SPEECH, V19, P788
  • [5] Independent comparative study of PCA, ICA, and LDA on the FERET data set
    Delac, K
    Grgic, M
    Grgic, S
    [J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2005, 15 (05) : 252 - 260
  • [6] Eisele T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P252, DOI 10.1109/ICSLP.1996.607092
  • [7] GOLUB G, 1965, J SOC IND APPL MAT B, P205
  • [8] Jin Q., 2000, P ICSLP, P250
  • [9] HIERARCHICAL MIXTURES OF EXPERTS AND THE EM ALGORITHM
    JORDAN, MI
    JACOBS, RA
    [J]. NEURAL COMPUTATION, 1994, 6 (02) : 181 - 214
  • [10] Eigenvoice modeling with sparse training data
    Kenny, P
    Boulianne, G
    Dumouchel, P
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03): : 345 - 354