Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms

被引:4
作者
Qamhan, Mustafa A. [1 ]
Meftah, Ali H. [1 ]
Selouani, Sid-Ahmed [2 ]
Alotaibi, Yousef A. [1 ]
Zakariah, Mohammed [1 ]
Seddiq, Yasser Mohammad [3 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
[2] Univ Moncton, 218 Bvd J Gauthier, Shippegan, NB E8S 1P6, Canada
[3] King Abdulaziz City Sci & Technol, Riyadh, Saudi Arabia
来源
2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE) | 2020年
关键词
emotion; classification; Arabic; spectrograms; CNN; LSTM;
D O I
10.1109/ccece47787.2020.9255752
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this study, a speech emotion recognition technique based on a deep learning neural network that uses the King Saud University Emotions' Arabic dataset is presented. The convolutional neural network and long short-term memory (LSTM) are used to design the primary system of the convolutional recurrent neural network (CRNN). This study further investigates the use of linearly spaced spectrograms as inputs to the emotional speech recognizers. The performance of the CRNN system is compared with the results obtained through an experiment evaluating the human capability to perceive the emotion from speech. This human perceptual evaluation is considered as the baseline system. The overall CRNN system achieves 84.55% and 77.51% accuracies for file and segment levels, respectively. These values of accuracy are considerably close to the human emotion perception scores.
引用
收藏
页数:5
相关论文
共 18 条
  • [1] Abadi M., 2015, TENSORFLOW LARGE SCA
  • [2] Boersma P., 2011, PRAAT DOING PHONETIC
  • [3] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [4] A tutorial survey of architectures, algorithms, and applications for deep learning
    Deng, Li
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2014, 3
  • [5] Deng L, 2013, INT CONF ACOUST SPEE, P8604, DOI 10.1109/ICASSP.2013.6639345
  • [6] Gulli A., 2017, DEEP LEARNING KERAS
  • [7] Han K, 2014, INTERSPEECH, P223
  • [8] Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music
    Han, Yoonchang
    Kim, Jaehun
    Lee, Kyogu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 208 - 221
  • [9] Kingma D. P., 2014, INT C LEARNING REPRE, DOI DOI 10.1145/1830483.1830503
  • [10] Le D, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P216, DOI 10.1109/ASRU.2013.6707732