Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

被引：69

作者：

Bocklet, Tobias ^{[1
]}

Maier, Andreas ^{[1
]}

Bauer, Josef G. ^{[2
]}

Burkhardt, Felix ^{[3
]}

Noeth, Elmar ^{[1
]}

机构：

[1] Univ Erlangen Nurnberg, Inst Pattern Recognit, D-8520 Erlangen, Germany

[2] Siemens AG, Munich CT IC5, Germany

[3] SSC ENPS, T Syst Enterprise Serv GmbH, Berlin, Germany

来源：

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年

关键词：

acoustic signal analysis; speaker classification; age; gender; Gaussian Mixture Models (GMM); support vector machine (SVM);

D O I：

10.1109/ICASSP.2008.4517932

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian Mixture Models (GMMs) with Universal Background Models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.

引用

页码：1605 / +

页数：2

共 12 条

[1]

[Anonymous], P EUR GEN SWITZ

[2] A tutorial on Support Vector Machines for pattern recognition [J].

Burges, CJC .

DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167

[3] Support vector machines using GMM supervectors for speaker verification [J].

Campbell, WM ;

Sturim, DE ;

Reynolds, DA .

IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311

[4] Emotion recognition in human-computer interaction [J].

Cowie, R ;

Douglas-Cowie, E ;

Tsapatsoulis, N ;

Votsis, G ;

Kollias, S ;

Fellenz, W ;

Taylor, JG .

IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (01) :32-80

[5]

DEHAK R, 2007, P INT 2007 ANT BELG

[6] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[7]

DOUGLAS A, 2002, ICASSP 2002 P IEEE I, V4, P4072

[8] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

[9]

Metze F, 2007, INT CONF ACOUST SPEE, P1089

[10]

Muller C, 2003, P 8 EUR C SPEECH COM, V3, P1305

← 1 2 →