Addressing the Data-Imbalance Problem in Kernel-based Speaker Verification via Utterance Partitioning and Speaker Comparison

被引:0
作者
Rao, Wei [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
speaker verification; GMM-SVM; speaker comparison; NIST SRE; utterance partitioning; data imbalance; VARIABILITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
GMM-SVM has become a promising approach to text-independent speaker verification. However, a problematic issue of this approach is the extremely serious imbalance between the numbers of speaker-class and impostor-class utterances available for training the speaker-dependent SVMs. This data-imbalance problem can be addressed by (1) creating more speaker-class supervectors for SVM training through utterance partitioning with acoustic vector resampling (UP-AVR) and (2) avoiding the SVM training so that speaker scores are formulated as an inner product discriminant function (IPDF) between the target-speaker's supervector and test supervector. This paper highlights the differences between these two approaches and compares the effect of using different kernels - including the KL divergence kernel, GMM-UBM mean interval (GUMI) kernel and geometric-mean-comparison kernel - on their performance. Experiments on the NIST 2010 Speaker Recognition Evaluation suggest that GMM-SVM with UP-AVR is superior to speaker comparison and that the GUMI kernel is slightly better than the KL kernel in speaker comparison.
引用
收藏
页码:2728 / 2731
页数:4
相关论文
共 11 条
[1]  
[Anonymous], P APSIPA ASC 2010 SI
[2]   Score normalization for text-independent speaker verification systems [J].
Auckenthaler, R ;
Carey, M ;
Lloyd-Thomas, H .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :42-54
[3]  
Campbell WM, 2006, INT CONF ACOUST SPEE, P97
[4]  
Campbell W. M., 2009, ADV NEURAL INFORM PR, V22, P207
[5]  
Campbell W. M., 2010, P INT 2010 JAP
[6]   A study of interspeaker variability in speaker verification [J].
Kenny, Patrick ;
Ouellet, Pierre ;
Dehak, Najim ;
Gupta, Vishwa ;
Dumouchel, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05) :980-988
[7]   Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification [J].
Mak, Man-Wai ;
Rao, Wei .
SPEECH COMMUNICATION, 2011, 53 (01) :119-130
[8]  
Pelecanos J., 2001, Proc. Speaker Odyssey, V13, P1
[9]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41
[10]   GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition [J].
You, Chang Huai ;
Lee, Kong Aik ;
Li, Haizhou .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06) :1300-1312