Emotions Classification from Speech with Deep Learning

被引:0
作者
Chowanda, Andry [1 ]
Muliono, Yohan [2 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
[2] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Cyber Secur Program, Jakarta 11480, Indonesia
关键词
Emotions recognition; speech modality; temporal information; affective system; NEURAL-NETWORK;
D O I
10.14569/IJACSA.2022.0130490
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Emotions are the essential parts that convey meaning to the interlocutors during social interactions. Hence, recognising emotions is paramount in building a good and natural affective system that can naturally interact with the human interlocutors. However, recognising emotions from social interactions require temporal information in order to classify the emotions correctly. This research aims to propose an architecture that extracts temporal information using the Temporal model of Convolutional Neural Network (CNN) and combined with the Long Short Term Memory (LSTM) architecture from the Speech modality. Several combinations and settings of the architectures were explored and presented in the paper. The results show that the best classifier achieved by the model trained with four layers of CNN combined with one layer of Bidirectional LSTM. Furthermore, the model was trained with an augmented training dataset with seven times more data than the original training dataset. The best model resulted in 94.25%, 57.07%, 0.2577 and 1.1678 for training accuracy, validation accuracy, training loss and validation loss, respectively. Moreover, Neutral (Calm) and Happy are the easiest classes to be recognised, while Angry is the hardest to be classified.
引用
收藏
页码:777 / 781
页数:5
相关论文
共 21 条
[1]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 (116) :56-76
[2]  
Chowanda A., 2021, J BIG DATA-GER, V8, P1
[3]   Recurrent Neural Network to Deep Learn Conversation in Indonesian [J].
Chowanda, Andry ;
Chowanda, Alan Darmasaputra .
DISCOVERY AND INNOVATION OF COMPUTER SCIENCE TECHNOLOGY IN ARTIFICIAL INTELLIGENCE ERA, 2017, 116 :579-586
[4]  
Delbouys Remi., 2018, P 19 INT SOC MUSIC I, P370
[5]  
Ekman Paul, 1999, HDB COGNITION EMOTIO, P45
[6]   Automatic Sarcasm Detection: A Survey [J].
Joshi, Aditya ;
Bhattacharyya, Pushpak ;
Carman, Mark J. .
ACM COMPUTING SURVEYS, 2017, 50 (05)
[7]   Speech Emotion Recognition Using Deep Learning Techniques: A Review [J].
Khalil, Ruhul Amin ;
Jones, Edward ;
Babar, Mohammad Inayatullah ;
Jan, Tariqullah ;
Zafar, Mohammad Haseeb ;
Alhussain, Thamer .
IEEE ACCESS, 2019, 7 :117327-117345
[8]   The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English [J].
Livingstone, Steven R. ;
Russo, Frank A. .
PLOS ONE, 2018, 13 (05)
[9]   Recent trends in deep learning based personality detection [J].
Mehta, Yash ;
Majumder, Navonil ;
Gelbukh, Alexander ;
Cambria, Erik .
ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (04) :2313-2339
[10]   A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition [J].
Mustaqeem ;
Kwon, Soonil .
SENSORS, 2020, 20 (01)