Speech-Based Emotion Classification for Human by Introducing Upgraded Long Short-Term Memory (ULSTM)

被引：0

作者：

Bhowmik, Subhrajit ^{[1
]}

Chatterjee, Akshay ^{[1
]}

Biswas, Sampurna ^{[1
]}

Farhin, Reshmina ^{[1
]}

Yasmin, Ghazaala ^{[1
]}

机构：

[1] St Thomas Coll Engn & Technol, Dept Comp Sci & Engn, 4 Diamond Harbour Rd, Kolkata 700023, India

来源：

COMPUTATIONAL INTELLIGENCE IN PATTERN RECOGNITION, CIPR 2020 | 2020年 / 1120卷

关键词：

Emotion recognition; Feature extraction; Classification model; Convolution neural network (CNN); Recurrent neural network (RNN); Gated recurrent unit (GRU); Long short-term memory (LSTM); Upgraded long short-term memory (ULSTM);

D O I：

10.1007/978-981-15-2449-3_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

All humans have the intelligence for emotions through emotional behaviour by social skills and by interacting with and imitating humans. Not only that, we also enhance and upgrade our skills for analysis of different emotions through learning with our experience in our surroundings. Now, what if the machine is capable of learning through its artificial intelligent skills? The ongoing exploration is being done using the deep learning model concept. This technique is being used to enhance the learning capacity of the machine which is most important in human emotion recognition because one emotion can be derived towards another type of emotion which is difficult to analyse. This theme has inclined us to explore this problem. The proposed method has been designed to categorize the human emotions through four different deep learning models, which are convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). For training these models, well-known physical and perceptual features have been fitted. The system has been tested on the benchmark data of Ryerson Audio-Visual Dataset for Emotional Speech and Song (RAVDESS). Furthermore, the mentioned deep learning model has been compared based on testing the above dataset in terms of the vanishing gradient problem. In addition, an upgraded model of LSTM has been proposed to get better accuracy and it is being tested with the existing model of LSTM.

引用

页码：101 / 112

页数：12

共 11 条

[1] Aldeneh Z, 2017, INT CONF ACOUST SPEE, P2741, DOI 10.1109/ICASSP.2017.7952655
[2] Chernykh V., 2017, ARXIV PREPRINT ARXIV
[3] Han K, 2014, INTERSPEECH, P223
[4] Combining Modality Specific Deep Neural Networks for Emotion Recognition in Video
Kahou, Samira Ebrahimi
Pal, Christopher
Bouthillier, Xavier
Froumenty, Pierre
Gulcehre, Caglar
Memisevic, Roland
Vincent, Pascal
Courville, Aaron
Bengio, Yoshua
[J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 543 - 550
[5] Kim Y, 2013, INT CONF ACOUST SPEE, P3687, DOI 10.1109/ICASSP.2013.6638346
[6] Lee J, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1537
[7] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
Mao, Qirong
Dong, Ming
Huang, Zhengwei
Zhan, Yongzhao
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
[8] Mirsamadi S, 2017, INT CONF ACOUST SPEE, P2227, DOI 10.1109/ICASSP.2017.7952552
[9] Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
Satt, Aharon
Rozenberg, Shai
Hoory, Ron
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1089 - 1093
[10] Trigeorgis G, 2016, INT CONF ACOUST SPEE, P5200, DOI 10.1109/ICASSP.2016.7472669

← 1 2 →