Speech Emotion Recognition Based on a Recurrent Neural Network Classification Model

被引:0
作者
Fonnegra, Ruben D. [1 ]
Diaz, Gloria M. [1 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
来源
ADVANCES IN COMPUTER ENTERTAINMENT TECHNOLOGY, ACE 2017 | 2018年 / 10714卷
关键词
Speech emotion recognition; Audio signals; Deep learning;
D O I
10.1007/978-3-319-76270-8_59
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Affective computing is still one of the most active areas of study for developing best human-machine interactions. Specifically, speech emotion recognition is widely used, due to its implementation feasibility. In this paper, we investigate the discriminative capabilities of recurrent neural networks in human emotion analysis from low-level acoustic descriptors extracted from speech signals. The proposed approach starts extracting 1580 features from the audio signal using the well-known OpenSmile toolbox. These features are then used as input to a recurrent Long Short-Term Memory (LSTM) neural network, which is trained for deciding the emotion content of the evaluated utterance. Performance evaluation was conducted by two experiments: a gender independent and a gender-dependent classification. Experimental results show that the proposed approach achieves 92% emotion recognition accuracy in the gender independent experiment, which outperforms previous works using the same experimental data. In the gender-dependent experiment, accuracy was 94.3% and 84.4% for men and women, respectively.
引用
收藏
页码:882 / 892
页数:11
相关论文
共 26 条
[1]   Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications [J].
Adrian Corneanu, Ciprian ;
Oliu Simon, Marc ;
Cohn, Jeffrey F. ;
Escalera Guerrero, Sergio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) :1548-1568
[2]  
Alva M. Yashaswi, 2015, 2015 IEEE INT C EL C, P1, DOI [10.1109/ICECCT.2015.7226047., DOI 10.1109/ICECCT.2015.7226047]
[3]  
[Anonymous], 2013, ANALES 15 REUNION PR
[4]   Affective level design for a role-playing videogame evaluated by a brain-computer interface and machine learning methods [J].
Balducci, Fabrizio ;
Grana, Costantino ;
Cucchiara, Rita .
VISUAL COMPUTER, 2017, 33 (04) :413-427
[5]   The Role of Cognitive and Affective Challenge in Entertainment Experience [J].
Bartsch, Anne ;
Hartmann, Tilo .
COMMUNICATION RESEARCH, 2017, 44 (01) :29-53
[6]  
Consoli D., 2010, Journal of Broad Research in Accounting, Negotiation, and Distribution, V1, P52
[7]   Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition [J].
Deng, Jun ;
Zhang, Zixing ;
Marchi, Erik ;
Schuller, Bjoern .
2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, :511-516
[8]   Towards Efficient Multi-Modal Emotion Recognition [J].
Dobrisek, Simon ;
Gajsek, Rok ;
Mihelic, France ;
Pavesic, Nikola ;
Struc, Vitomir .
INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
[9]  
Eyben F., 2010, P 18 ACM INT C MULT, P1459
[10]   Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis [J].
Fu, Jiamin ;
Mao, Qirong ;
Tu, Juanjuan ;
Zhan, Yongzhao .
MULTIMEDIA SYSTEMS, 2019, 25 (05) :451-461