A study of interspeaker variability in speaker verification

被引:423
作者
Kenny, Patrick [1 ]
Ouellet, Pierre [1 ]
Dehak, Najim [1 ]
Gupta, Vishwa [1 ]
Dumouchel, Pierre [1 ]
机构
[1] Ctr Rech Informat Montreal, Montreal, PQ H3A 1B9, Canada
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2008年 / 16卷 / 05期
关键词
channel factors; Gaussian mixture model (GMM); speaker factors; speaker verification;
D O I
10.1109/TASL.2008.925147
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a new approach to the problem of estimating the hyperparameters which define the interspeaker variability model in joint factor analysis. We tested the proposed estimation technique on the NIST 2006 speaker recognition evaluation data and obtained 10%-15% reductions in error rates on the core condition and the extended data condition (as measured both by equal error rates and the NIST detection cost function). We show that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types. (The comparisons are based on the best results on these tasks that have been reported in the literature.) In the case of the cross-channel condition, a factor analysis model with 300 speaker factors and 200 channel factors can achieve equal error rates of less than 3.0%. This is a substantial improvement over the best results that have previously been reported on this task.
引用
收藏
页码:980 / 988
页数:9
相关论文
共 36 条
[1]  
[Anonymous], P EUROSPEECH 03
[2]  
[Anonymous], NIST YEAR 2006 SPEAK
[3]  
[Anonymous], 2007, P INTERSPEECH
[4]  
Aronowitz H., 2007, P INT 07 ANTW BELG A, P298
[5]  
Bilmes JA, 2004, IMA VOL MATH APPL, V138, P191
[6]   Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006 [J].
Bruemmer, Niko ;
Burget, Lukas ;
Cernocky, Jan 'Honza' ;
Glembek, Ondrej ;
Grezl, Frantisek ;
Karafiat, Martin ;
van Leeuwen, David A. ;
Matejka, Pavel ;
Schwarz, Petr ;
Strasheim, Albert .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2072-2084
[7]   Analysis of feature extraction and channel compensation in a GMM speaker recognition system [J].
Burget, Lukas ;
Matejka, Pavel ;
Schwarz, Petr ;
Glembek, Ondfei ;
Cernocky, Jan 'Honza' .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :1979-1986
[8]  
CAMPBELL WM, 2007, P ICASSP 07 HON HI, P217
[9]   Compensation of nuisance factors for speaker and language recognition [J].
Castaldo, Fabio ;
Colibro, Daniele ;
Dalmasso, Emanuele ;
Laface, Pietro ;
Vair, Claudio .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :1969-1978
[10]   Modeling prosodic features with joint factor analysis for speaker verification [J].
Dehak, Najim ;
Dumouchel, Pierre ;
Kenny, Patrick .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2095-2103