Emotions Classification from Speech with Deep Learning

被引：0

作者：

Chowanda, Andry ^{[1
]}

Muliono, Yohan ^{[2
]}

机构：

[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia

[2] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Cyber Secur Program, Jakarta 11480, Indonesia

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2022年 / 13卷 / 04期

关键词：

Emotions recognition; speech modality; temporal information; affective system; NEURAL-NETWORK;

D O I：

10.14569/IJACSA.2022.0130490

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Emotions are the essential parts that convey meaning to the interlocutors during social interactions. Hence, recognising emotions is paramount in building a good and natural affective system that can naturally interact with the human interlocutors. However, recognising emotions from social interactions require temporal information in order to classify the emotions correctly. This research aims to propose an architecture that extracts temporal information using the Temporal model of Convolutional Neural Network (CNN) and combined with the Long Short Term Memory (LSTM) architecture from the Speech modality. Several combinations and settings of the architectures were explored and presented in the paper. The results show that the best classifier achieved by the model trained with four layers of CNN combined with one layer of Bidirectional LSTM. Furthermore, the model was trained with an augmented training dataset with seven times more data than the original training dataset. The best model resulted in 94.25%, 57.07%, 0.2577 and 1.1678 for training accuracy, validation accuracy, training loss and validation loss, respectively. Moreover, Neutral (Calm) and Happy are the easiest classes to be recognised, while Angry is the hardest to be classified.

引用

页码：777 / 781

页数：5

共 21 条

[1] Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].