Speech Emotion Recognition Based on a Recurrent Neural Network Classification Model

被引:0
作者
Fonnegra, Ruben D. [1 ]
Diaz, Gloria M. [1 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
来源
ADVANCES IN COMPUTER ENTERTAINMENT TECHNOLOGY, ACE 2017 | 2018年 / 10714卷
关键词
Speech emotion recognition; Audio signals; Deep learning;
D O I
10.1007/978-3-319-76270-8_59
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Affective computing is still one of the most active areas of study for developing best human-machine interactions. Specifically, speech emotion recognition is widely used, due to its implementation feasibility. In this paper, we investigate the discriminative capabilities of recurrent neural networks in human emotion analysis from low-level acoustic descriptors extracted from speech signals. The proposed approach starts extracting 1580 features from the audio signal using the well-known OpenSmile toolbox. These features are then used as input to a recurrent Long Short-Term Memory (LSTM) neural network, which is trained for deciding the emotion content of the evaluated utterance. Performance evaluation was conducted by two experiments: a gender independent and a gender-dependent classification. Experimental results show that the proposed approach achieves 92% emotion recognition accuracy in the gender independent experiment, which outperforms previous works using the same experimental data. In the gender-dependent experiment, accuracy was 94.3% and 84.4% for men and women, respectively.
引用
收藏
页码:882 / 892
页数:11
相关论文
共 26 条
[11]  
Han K, 2014, INTERSPEECH, P223
[12]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[13]   Audio-visual emotion recognition using multi-directional regression and Ridgelet transform [J].
Hossain, M. Shamim ;
Muhammad, Ghulam .
JOURNAL ON MULTIMODAL USER INTERFACES, 2016, 10 (04) :325-333
[14]   Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features [J].
Jassim, Wissam A. ;
Paramesran, Raveendran ;
Harte, Naomi .
IET SIGNAL PROCESSING, 2017, 11 (05) :587-595
[15]   Emotion, age, and gender classification in children's speech by humans and machines [J].
Kaya, Heysem ;
Salah, Albert Ali ;
Karpovc, Alexey ;
Frolova, Olga ;
Grigorev, Aleksey ;
Lyakso, Elena .
COMPUTER SPEECH AND LANGUAGE, 2017, 46 :268-283
[16]  
Kingma D. P., P 3 INT C LEARN REPR
[17]   EEG-Based Emotion Recognition in Music Listening [J].
Lin, Yuan-Pin ;
Wang, Chi-Hong ;
Jung, Tzyy-Ping ;
Wu, Tien-Lin ;
Jeng, Shyh-Kang ;
Duann, Jeng-Ren ;
Chen, Jyh-Horng .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2010, 57 (07) :1798-1806
[18]  
Martin O., 2006, 22 INT C DAT ENG WOR, P8, DOI [DOI 10.1109/ICDEW.2006.145, 10.1109/ICDEW.2006.145]
[19]  
Mirsamadi S, 2017, INT CONF ACOUST SPEE, P2227, DOI 10.1109/ICASSP.2017.7952552
[20]   Towards an intelligent framework for multimodal affective data analysis [J].
Poria, Soujanya ;
Cambria, Erik ;
Hussain, Amir ;
Huang, Guang-Bin .
NEURAL NETWORKS, 2015, 63 :104-116