Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

被引:69
作者
Bocklet, Tobias [1 ]
Maier, Andreas [1 ]
Bauer, Josef G. [2 ]
Burkhardt, Felix [3 ]
Noeth, Elmar [1 ]
机构
[1] Univ Erlangen Nurnberg, Inst Pattern Recognit, D-8520 Erlangen, Germany
[2] Siemens AG, Munich CT IC5, Germany
[3] SSC ENPS, T Syst Enterprise Serv GmbH, Berlin, Germany
来源
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年
关键词
acoustic signal analysis; speaker classification; age; gender; Gaussian Mixture Models (GMM); support vector machine (SVM);
D O I
10.1109/ICASSP.2008.4517932
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian Mixture Models (GMMs) with Universal Background Models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.
引用
收藏
页码:1605 / +
页数:2
相关论文
共 12 条
[1]  
[Anonymous], P EUR GEN SWITZ
[2]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[3]   Support vector machines using GMM supervectors for speaker verification [J].
Campbell, WM ;
Sturim, DE ;
Reynolds, DA .
IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311
[4]   Emotion recognition in human-computer interaction [J].
Cowie, R ;
Douglas-Cowie, E ;
Tsapatsoulis, N ;
Votsis, G ;
Kollias, S ;
Fellenz, W ;
Taylor, JG .
IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (01) :32-80
[5]  
DEHAK R, 2007, P INT 2007 ANT BELG
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
DOUGLAS A, 2002, ICASSP 2002 P IEEE I, V4, P4072
[8]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298
[9]  
Metze F, 2007, INT CONF ACOUST SPEE, P1089
[10]  
Muller C, 2003, P 8 EUR C SPEECH COM, V3, P1305