Factor Analysis of Auto-Associative Neural Networks With Application in Speaker Verification

被引：14

作者：

Garimella, Sri ^{[1
]}

Hermansky, Hynek ^{[1
]}

机构：

[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2013年 / 24卷 / 04期

关键词：

Factor analysis; i-vector; neural networks; speaker verification;

D O I：

10.1109/TNNLS.2012.2236652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Auto-associative neural network (AANN) is a fully connected feed-forward neural network, trained to reconstruct its input at its output through a hidden compression layer, which has fewer numbers of nodes than the dimensionality of input. AANNs are used to model speakers in speaker verification, where a speaker-specific AANN model is obtained by adapting (or retraining) the universal background model (UBM) AANN, an AANN trained on multiple held out speakers, using corresponding speaker data. When the amount of speaker data is limited, this adaptation procedure may lead to overfitting as all the parameters of UBM-AANN are adapted. In this paper, we introduce and develop the factor analysis theory of AANNs to alleviate this problem. We hypothesize that only the weight matrix connecting the last nonlinear hidden layer and the output layer is speaker-specific, and further restrict it to a common low-dimensional subspace during adaptation. The subspace is learned using large amounts of development data, and is held fixed during adaptation. Thus, only the coordinates in a subspace, also known as i-vector, need to be estimated using speaker-specific data. The update equations are derived for learning both the common low-dimensional subspace and the i-vectors corresponding to speakers in the subspace. The resultant i-vector representation is used as a feature for the probabilistic linear discriminant analysis model. The proposed system shows promising results on the NIST-08 speaker recognition evaluation (SRE), and yields a 23% relative improvement in equal error rate over the previously proposed weighted least squares-based subspace AANNs system. The experiments on NIST-10 SRE confirm that these improvements are consistent and generalize across datasets.

引用

页码：522 / 528

页数：7

共 20 条

[1] [Anonymous], OD SPEAK LANG REC WO
[2] [Anonymous], 2008, ICSI QUICKNET SOFTWA
[3] Neural Learning Circuits Utilizing Nano-Crystalline Silicon Transistors and Memristors
Cantley, Kurtis D.
Subramaniam, Anand
Stiegler, Harvey J.
Chapman, Richard A.
Vogel, Eric M.
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (04) : 565 - 573
[4] Equilibria of Perceptrons for Simple Contingency Problems
Dawson, Michael R. W.
Dupuis, Brian
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (08) : 1340 - 1344
[5] Modeling prosodic features with joint factor analysis for speaker verification
Dehak, Najim
Dumouchel, Pierre
Kenny, Patrick
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2095 - 2103
[6] Front-End Factor Analysis for Speaker Verification
Dehak, Najim
Kenny, Patrick J.
Dehak, Reda
Dumouchel, Pierre
Ouellet, Pierre
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
[7] Ganapathy S, 2011, INT CONF ACOUST SPEE, P4836
[8] Garcia D., 2010, 19 INT C ELECT MACHI, P1
[9] Garcia-Romero D., 2011, P INTERSPEECH, P1
[10] COMPARISON OF SCORING METHODS USED IN SPEAKER RECOGNITION WITH JOINT FACTOR ANALYSIS
Glembek, Ondrej
Burget, Lukas
Dehak, Najim
Bruemmer, Niko
Kenny, Patrick
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4057 - +

← 1 2 →