Speech-Based Emotion Classification for Human by Introducing Upgraded Long Short-Term Memory (ULSTM)

被引:0
作者
Bhowmik, Subhrajit [1 ]
Chatterjee, Akshay [1 ]
Biswas, Sampurna [1 ]
Farhin, Reshmina [1 ]
Yasmin, Ghazaala [1 ]
机构
[1] St Thomas Coll Engn & Technol, Dept Comp Sci & Engn, 4 Diamond Harbour Rd, Kolkata 700023, India
来源
COMPUTATIONAL INTELLIGENCE IN PATTERN RECOGNITION, CIPR 2020 | 2020年 / 1120卷
关键词
Emotion recognition; Feature extraction; Classification model; Convolution neural network (CNN); Recurrent neural network (RNN); Gated recurrent unit (GRU); Long short-term memory (LSTM); Upgraded long short-term memory (ULSTM);
D O I
10.1007/978-981-15-2449-3_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
All humans have the intelligence for emotions through emotional behaviour by social skills and by interacting with and imitating humans. Not only that, we also enhance and upgrade our skills for analysis of different emotions through learning with our experience in our surroundings. Now, what if the machine is capable of learning through its artificial intelligent skills? The ongoing exploration is being done using the deep learning model concept. This technique is being used to enhance the learning capacity of the machine which is most important in human emotion recognition because one emotion can be derived towards another type of emotion which is difficult to analyse. This theme has inclined us to explore this problem. The proposed method has been designed to categorize the human emotions through four different deep learning models, which are convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). For training these models, well-known physical and perceptual features have been fitted. The system has been tested on the benchmark data of Ryerson Audio-Visual Dataset for Emotional Speech and Song (RAVDESS). Furthermore, the mentioned deep learning model has been compared based on testing the above dataset in terms of the vanishing gradient problem. In addition, an upgraded model of LSTM has been proposed to get better accuracy and it is being tested with the existing model of LSTM.
引用
收藏
页码:101 / 112
页数:12
相关论文
共 11 条
  • [1] Aldeneh Z, 2017, INT CONF ACOUST SPEE, P2741, DOI 10.1109/ICASSP.2017.7952655
  • [2] Chernykh V., 2017, ARXIV PREPRINT ARXIV
  • [3] Han K, 2014, INTERSPEECH, P223
  • [4] Combining Modality Specific Deep Neural Networks for Emotion Recognition in Video
    Kahou, Samira Ebrahimi
    Pal, Christopher
    Bouthillier, Xavier
    Froumenty, Pierre
    Gulcehre, Caglar
    Memisevic, Roland
    Vincent, Pascal
    Courville, Aaron
    Bengio, Yoshua
    [J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 543 - 550
  • [5] Kim Y, 2013, INT CONF ACOUST SPEE, P3687, DOI 10.1109/ICASSP.2013.6638346
  • [6] Lee J, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1537
  • [7] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
    Mao, Qirong
    Dong, Ming
    Huang, Zhengwei
    Zhan, Yongzhao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
  • [8] Mirsamadi S, 2017, INT CONF ACOUST SPEE, P2227, DOI 10.1109/ICASSP.2017.7952552
  • [9] Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
    Satt, Aharon
    Rozenberg, Shai
    Hoory, Ron
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1089 - 1093
  • [10] Trigeorgis G, 2016, INT CONF ACOUST SPEE, P5200, DOI 10.1109/ICASSP.2016.7472669