Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification

被引：28

作者：

Mak, Man-Wai ^{[1
]}

Rao, Wei ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Elect & Informat Engn Dept, Ctr Signal Proc, Hong Kong, Hong Kong, Peoples R China

来源：

SPEECH COMMUNICATION | 2011年 / 53卷 / 01期

关键词：

Speaker verification; GMM-supervectors (GSV); Utterance partitioning; GMM-SVM; Support vector machine; Random resampling; Data imbalance; MACHINES; ENSEMBLE;

D O I：

10.1016/j.specom.2010.06.011

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recent research has demonstrated the merit of combining Gaussian mixture models and support vector machine (SVM) for text-independent speaker verification. However, one unaddressed issue in this GMM-SVM approach is the imbalance between the numbers of speaker-class utterances and impostor-class utterances available for training a speaker-dependent SVM. This paper proposes a resampling technique - namely utterance partitioning with acoustic vector resampling (UP-AVR) - to mitigate the data imbalance problem. Briefly, the sequence order of acoustic vectors in an enrollment utterance is first randomized, which is followed by partitioning the randomized sequence into a number of segments. Each of these segments is then used to produce a GM M supervector via MAP adaptation and mean vector concatenation. The randomization and partitioning processes are repeated several times to produce a sufficient number of speaker-class supervectors for training an SVM. Experimental evaluations based on the NIST 2002 and 2004 SRE suggest that UP-AVR can reduce the error rate of GMM-SVM systems. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：119 / 130

页数：12

共 36 条

[1]

[Anonymous], 1999, Proceedings of the International Joint Conference on Artificial Intelligence

[2]

[Anonymous], 2004, LREC

[3]

[Anonymous], P EUR GEN SWITZ SEPT

[4] EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].