Speech Emotion Recognition Based on a Recurrent Neural Network Classification Model

被引：0

作者：

Fonnegra, Ruben D. ^{[1
]}

Diaz, Gloria M. ^{[1
]}

机构：

[1] Inst Tecnol Metropolitano, Medellin, Colombia

来源：

ADVANCES IN COMPUTER ENTERTAINMENT TECHNOLOGY, ACE 2017 | 2018年 / 10714卷

关键词：

Speech emotion recognition; Audio signals; Deep learning;

D O I：

10.1007/978-3-319-76270-8_59

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Affective computing is still one of the most active areas of study for developing best human-machine interactions. Specifically, speech emotion recognition is widely used, due to its implementation feasibility. In this paper, we investigate the discriminative capabilities of recurrent neural networks in human emotion analysis from low-level acoustic descriptors extracted from speech signals. The proposed approach starts extracting 1580 features from the audio signal using the well-known OpenSmile toolbox. These features are then used as input to a recurrent Long Short-Term Memory (LSTM) neural network, which is trained for deciding the emotion content of the evaluated utterance. Performance evaluation was conducted by two experiments: a gender independent and a gender-dependent classification. Experimental results show that the proposed approach achieves 92% emotion recognition accuracy in the gender independent experiment, which outperforms previous works using the same experimental data. In the gender-dependent experiment, accuracy was 94.3% and 84.4% for men and women, respectively.

引用

页码：882 / 892

页数：11

共 26 条

[11]

Han K, 2014, INTERSPEECH, P223

[12]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]

[13] Audio-visual emotion recognition using multi-directional regression and Ridgelet transform [J].