SPEECH EMOTION RECOGNITION WITH ACOUSTIC AND LEXICAL FEATURES

被引:0
作者
Jin, Qin [1 ,2 ]
Li, Chengxin [1 ]
Chen, Shizhe [1 ]
Wu, Huimin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Comp Sci Dept, Beijing, Peoples R China
[2] Renmin Univ China, Minist Educ, Key Lab Data Engn & Knowledge Engn, Beijing 100872, Peoples R China
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
基金
北京市自然科学基金;
关键词
Emotion recognition; Acoustic features; Emotion lexicon; Lexical features; Support vector machine; CLASSIFICATION; AUDIO;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we explore one of the key aspects in building an emotion recognition system: generating suitable feature representations. We generate feature representations from both acoustic and lexical levels. At the acoustic level, we first extract low-level features such as intensity, F0, jitter, shimmer and spectral contours etc. We then generate different acoustic feature representations based on these low-level features, including statistics over these features, a new representation derived from a set of low-level acoustic codewords, and a new representation from Gaussian Supervectors. At the lexical level, we propose a new feature representation named emotion vector (eVector). We also use the traditional Bag-of-Words (BoW) feature. We apply these feature representations for emotion recognition and compare their performance on the USC-IEMOCAP database. We also combine these different feature representations via early fusion and late fusion. Our experimental results show that late fusion of both acoustic and lexical features achieves four-class emotion recognition accuracy of 69.2%.
引用
收藏
页码:4749 / 4753
页数:5
相关论文
共 22 条
[1]  
[Anonymous], 2009, ADV NEURAL INFORM PR
[2]  
[Anonymous], INTERSPEECH
[3]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[4]   Support vector machines using GMM supervectors for speaker verification [J].
Campbell, WM ;
Sturim, DE ;
Reynolds, DA .
IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311
[5]   Speech emotion recognition: Features and classification models [J].
Chen, Lijiang ;
Mao, Xia ;
Xue, Yuli ;
Cheng, Lee Lung .
DIGITAL SIGNAL PROCESSING, 2012, 22 (06) :1154-1160
[6]   Survey on speech emotion recognition: Features, classification schemes, and databases [J].
El Ayadi, Moataz ;
Kamel, Mohamed S. ;
Karray, Fakhri .
PATTERN RECOGNITION, 2011, 44 (03) :572-587
[7]  
Eyben F, 2010, P 18 ACM INT C MULT, P1459, DOI [DOI 10.1145/1873951.1874246, 10.1145/1873951.1874246]
[8]   Acoustical properties of speech as indicators of depression and suicidal risk [J].
France, DJ ;
Shiavi, RG ;
Silverman, S ;
Silverman, M ;
Wilkes, DM .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2000, 47 (07) :829-837
[9]   Application of speaker- and language identification state-of-the-art techniques for emotion recognition [J].
Kockmann, Marcel ;
Burget, Lukas ;
Cernocky, Jan Honza .
SPEECH COMMUNICATION, 2011, 53 (9-10) :1172-1185
[10]   Toward detecting emotions in spoken dialogs [J].
Lee, CM ;
Narayanan, SS .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (02) :293-303