Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

被引:55
作者
Yu, Yeonguk [1 ]
Kim, Yoon-Joong [1 ]
机构
[1] Hanbat Natl Univ, Dept Comp Engn, Daejeon 34158, South Korea
关键词
speech-emotion recognition; attention mechanism; LSTM;
D O I
10.3390/electronics9050713
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a speech-emotion recognition (SER) model with an "attention-long Long Short-Term Memory (LSTM)-attention" component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model's performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.
引用
收藏
页数:12
相关论文
共 30 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
    Albanie, Samuel
    Nagrani, Arsha
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 292 - 301
  • [3] Alvarez A, 2006, LECT NOTES ARTIF INT, V4188, P565
  • [4] [Anonymous], 2006, PROC SPEECH PROSODY
  • [5] [Anonymous], 2015, ICLR
  • [6] Basu T. K., 2008, TENCON 2008 2008 IEE, P1, DOI 10.1109/TENCON.2008.4766487
  • [7] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [8] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
  • [9] Chenchah F, 2015, INT J ADV COMPUT SC, V6, P135
  • [10] Etienne C., 2018, ARXIV18020563068