Improving Automatic Emotion Recognition from Speech Signals

被引:0
作者
Bozkurt, Elif [1 ]
Erzin, Engin [1 ]
Erdem, Cigdem Eroglu [2 ]
Erdem, A. Tanju [3 ]
机构
[1] Koc Univ, Coll Engn, TR-34450 Istanbul, Turkey
[2] Bahcesehir Univ, Dept Elect & Elect Engn, Istanbul, Turkey
[3] Ozyegin Univ, Fac Engn, Istanbul, Turkey
来源
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年
关键词
emotion recognition; prosody modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a speech signal driven emotion recognition system. Our system is trained and tested with the INTERSPEECH 2009 Emotion Challenge corpus, which includes spontaneous and emotionally rich recordings. The challenge includes classifier and feature sub-challenges with five-class and two-class classification problems. We investigate prosody related, spectral and HMM-based features for the evaluation of emotion recognition with Gaussian mixture model (GMM) based classifiers. Spectral features consist of mel-scale cepstral coefficients (MFCC), line spectral frequency (LSF) features and their derivatives, whereas prosody-related features consist of mean normalized values of pitch, first derivative of pitch and intensity. Unsupervised training of HMM structures are employed to define prosody related temporal features for the emotion recognition problem. We also investigate data fusion of different features and decision fusion of different classifiers, which are not well studied for emotion recognition framework. Experimental results of automatic emotion recognition with the INTERSPEECH 2009 Emotion Challenge corpus are presented.
引用
收藏
页码:312 / +
页数:2
相关论文
共 10 条
[1]  
Burkhardt F, 2005, INTERSPEECH, V5, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[2]  
Deller J. R., 1993, DISCRETE TIME PROCES
[3]   Multimodal speaker identification using an adaptive classifier cascade based on modality reliability [J].
Erzin, E ;
Yemez, Y ;
Tekalp, AM .
IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (05) :840-852
[4]  
GRIMM M, 2008, VERA MITTAG GERMAN A, P865
[5]   LINE SPECTRUM REPRESENTATION OF LINEAR PREDICTOR COEFFICIENTS OF SPEECH SIGNALS [J].
ITAKURA, F .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 57 :S35-S35
[6]   Toward detecting emotions in spoken dialogs [J].
Lee, CM ;
Narayanan, SS .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (02) :293-303
[7]  
Oudeyer P., 2003, INT J HUM-COMPUT ST, V59, P157, DOI DOI 10.1016/S1071-581(02)00141-6
[8]  
SCHERER KR, P 13 INT C PHON SCI
[9]  
SCHULLER B, 2009, INTERSPEECH 2009
[10]  
Steidl S., 2009, Automatic Classification of Emotion-Related User States in Spontaneous Children's Speech