Factor Analysis of Auto-Associative Neural Networks With Application in Speaker Verification

被引:14
作者
Garimella, Sri [1 ]
Hermansky, Hynek [1 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
Factor analysis; i-vector; neural networks; speaker verification;
D O I
10.1109/TNNLS.2012.2236652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Auto-associative neural network (AANN) is a fully connected feed-forward neural network, trained to reconstruct its input at its output through a hidden compression layer, which has fewer numbers of nodes than the dimensionality of input. AANNs are used to model speakers in speaker verification, where a speaker-specific AANN model is obtained by adapting (or retraining) the universal background model (UBM) AANN, an AANN trained on multiple held out speakers, using corresponding speaker data. When the amount of speaker data is limited, this adaptation procedure may lead to overfitting as all the parameters of UBM-AANN are adapted. In this paper, we introduce and develop the factor analysis theory of AANNs to alleviate this problem. We hypothesize that only the weight matrix connecting the last nonlinear hidden layer and the output layer is speaker-specific, and further restrict it to a common low-dimensional subspace during adaptation. The subspace is learned using large amounts of development data, and is held fixed during adaptation. Thus, only the coordinates in a subspace, also known as i-vector, need to be estimated using speaker-specific data. The update equations are derived for learning both the common low-dimensional subspace and the i-vectors corresponding to speakers in the subspace. The resultant i-vector representation is used as a feature for the probabilistic linear discriminant analysis model. The proposed system shows promising results on the NIST-08 speaker recognition evaluation (SRE), and yields a 23% relative improvement in equal error rate over the previously proposed weighted least squares-based subspace AANNs system. The experiments on NIST-10 SRE confirm that these improvements are consistent and generalize across datasets.
引用
收藏
页码:522 / 528
页数:7
相关论文
共 20 条
  • [1] [Anonymous], OD SPEAK LANG REC WO
  • [2] [Anonymous], 2008, ICSI QUICKNET SOFTWA
  • [3] Neural Learning Circuits Utilizing Nano-Crystalline Silicon Transistors and Memristors
    Cantley, Kurtis D.
    Subramaniam, Anand
    Stiegler, Harvey J.
    Chapman, Richard A.
    Vogel, Eric M.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (04) : 565 - 573
  • [4] Equilibria of Perceptrons for Simple Contingency Problems
    Dawson, Michael R. W.
    Dupuis, Brian
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (08) : 1340 - 1344
  • [5] Modeling prosodic features with joint factor analysis for speaker verification
    Dehak, Najim
    Dumouchel, Pierre
    Kenny, Patrick
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2095 - 2103
  • [6] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [7] Ganapathy S, 2011, INT CONF ACOUST SPEE, P4836
  • [8] Garcia D., 2010, 19 INT C ELECT MACHI, P1
  • [9] Garcia-Romero D., 2011, P INTERSPEECH, P1
  • [10] COMPARISON OF SCORING METHODS USED IN SPEAKER RECOGNITION WITH JOINT FACTOR ANALYSIS
    Glembek, Ondrej
    Burget, Lukas
    Dehak, Najim
    Bruemmer, Niko
    Kenny, Patrick
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4057 - +