Addressing the Data-Imbalance Problem in Kernel-based Speaker Verification via Utterance Partitioning and Speaker Comparison

被引：0

作者：

Rao, Wei ^{[1
]}

Mak, Man-Wai ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

speaker verification; GMM-SVM; speaker comparison; NIST SRE; utterance partitioning; data imbalance; VARIABILITY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

GMM-SVM has become a promising approach to text-independent speaker verification. However, a problematic issue of this approach is the extremely serious imbalance between the numbers of speaker-class and impostor-class utterances available for training the speaker-dependent SVMs. This data-imbalance problem can be addressed by (1) creating more speaker-class supervectors for SVM training through utterance partitioning with acoustic vector resampling (UP-AVR) and (2) avoiding the SVM training so that speaker scores are formulated as an inner product discriminant function (IPDF) between the target-speaker's supervector and test supervector. This paper highlights the differences between these two approaches and compares the effect of using different kernels - including the KL divergence kernel, GMM-UBM mean interval (GUMI) kernel and geometric-mean-comparison kernel - on their performance. Experiments on the NIST 2010 Speaker Recognition Evaluation suggest that GMM-SVM with UP-AVR is superior to speaker comparison and that the GUMI kernel is slightly better than the KL kernel in speaker comparison.

引用

页码：2728 / 2731

页数：4

共 11 条

[1]

[Anonymous], P APSIPA ASC 2010 SI

[2] Score normalization for text-independent speaker verification systems [J].

Auckenthaler, R ;

Carey, M ;

Lloyd-Thomas, H .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :42-54

[3]

Campbell WM, 2006, INT CONF ACOUST SPEE, P97

[4]

Campbell W. M., 2009, ADV NEURAL INFORM PR, V22, P207

[5]

Campbell W. M., 2010, P INT 2010 JAP

[6] A study of interspeaker variability in speaker verification [J].

Kenny, Patrick ;

Ouellet, Pierre ;

Dehak, Najim ;

Gupta, Vishwa ;

Dumouchel, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05) :980-988

[7] Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification [J].

Mak, Man-Wai ;

Rao, Wei .

SPEECH COMMUNICATION, 2011, 53 (01) :119-130

[8]

Pelecanos J., 2001, Proc. Speaker Odyssey, V13, P1

[9] Speaker verification using adapted Gaussian mixture models [J].

Reynolds, DA ;

Quatieri, TF ;

Dunn, RB .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41

[10] GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition [J].

You, Chang Huai ;

Lee, Kong Aik ;

Li, Haizhou .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06) :1300-1312

← 1 2 →