SPEECH EMOTION RECOGNITION WITH ACOUSTIC AND LEXICAL FEATURES

被引：0

作者：

Jin, Qin ^{[1
,2
]}

Li, Chengxin ^{[1
]}

Chen, Shizhe ^{[1
]}

Wu, Huimin ^{[1
]}

机构：

[1] Renmin Univ China, Sch Informat, Comp Sci Dept, Beijing, Peoples R China

[2] Renmin Univ China, Minist Educ, Key Lab Data Engn & Knowledge Engn, Beijing 100872, Peoples R China

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

基金：

北京市自然科学基金;

关键词：

Emotion recognition; Acoustic features; Emotion lexicon; Lexical features; Support vector machine; CLASSIFICATION; AUDIO;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we explore one of the key aspects in building an emotion recognition system: generating suitable feature representations. We generate feature representations from both acoustic and lexical levels. At the acoustic level, we first extract low-level features such as intensity, F0, jitter, shimmer and spectral contours etc. We then generate different acoustic feature representations based on these low-level features, including statistics over these features, a new representation derived from a set of low-level acoustic codewords, and a new representation from Gaussian Supervectors. At the lexical level, we propose a new feature representation named emotion vector (eVector). We also use the traditional Bag-of-Words (BoW) feature. We apply these feature representations for emotion recognition and compare their performance on the USC-IEMOCAP database. We also combine these different feature representations via early fusion and late fusion. Our experimental results show that late fusion of both acoustic and lexical features achieves four-class emotion recognition accuracy of 69.2%.

引用

页码：4749 / 4753

页数：5

共 22 条

[1]

[Anonymous], 2009, ADV NEURAL INFORM PR

[2]

[Anonymous], INTERSPEECH

[3] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[4] Support vector machines using GMM supervectors for speaker verification [J].

Campbell, WM ;

Sturim, DE ;

Reynolds, DA .

IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311

[5] Speech emotion recognition: Features and classification models [J].

Chen, Lijiang ;

Mao, Xia ;

Xue, Yuli ;

Cheng, Lee Lung .

DIGITAL SIGNAL PROCESSING, 2012, 22 (06) :1154-1160

[6] Survey on speech emotion recognition: Features, classification schemes, and databases [J].

El Ayadi, Moataz ;

Kamel, Mohamed S. ;

Karray, Fakhri .

PATTERN RECOGNITION, 2011, 44 (03) :572-587

[7]

Eyben F, 2010, P 18 ACM INT C MULT, P1459, DOI [DOI 10.1145/1873951.1874246, 10.1145/1873951.1874246]

[8] Acoustical properties of speech as indicators of depression and suicidal risk [J].

France, DJ ;

Shiavi, RG ;

Silverman, S ;

Silverman, M ;

Wilkes, DM .

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2000, 47 (07) :829-837

[9] Application of speaker- and language identification state-of-the-art techniques for emotion recognition [J].

Kockmann, Marcel ;

Burget, Lukas ;

Cernocky, Jan Honza .

SPEECH COMMUNICATION, 2011, 53 (9-10) :1172-1185

[10] Toward detecting emotions in spoken dialogs [J].

Lee, CM ;

Narayanan, SS .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (02) :293-303

← 1 2 3 →