Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms

被引：5

作者：

Qamhan, Mustafa A. ^{[1
]}

Meftah, Ali H. ^{[1
]}

Selouani, Sid-Ahmed ^{[2
]}

Alotaibi, Yousef A. ^{[1
]}

Zakariah, Mohammed ^{[1
]}

Seddiq, Yasser Mohammad ^{[3
]}

机构：

[1] King Saud Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia

[2] Univ Moncton, 218 Bvd J Gauthier, Shippegan, NB E8S 1P6, Canada

[3] King Abdulaziz City Sci & Technol, Riyadh, Saudi Arabia

来源：

2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE) | 2020年

关键词：

emotion; classification; Arabic; spectrograms; CNN; LSTM;

D O I：

10.1109/ccece47787.2020.9255752

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this study, a speech emotion recognition technique based on a deep learning neural network that uses the King Saud University Emotions' Arabic dataset is presented. The convolutional neural network and long short-term memory (LSTM) are used to design the primary system of the convolutional recurrent neural network (CRNN). This study further investigates the use of linearly spaced spectrograms as inputs to the emotional speech recognizers. The performance of the CRNN system is compared with the results obtained through an experiment evaluating the human capability to perceive the emotion from speech. This human perceptual evaluation is considered as the baseline system. The overall CRNN system achieves 84.55% and 77.51% accuracies for file and segment levels, respectively. These values of accuracy are considerably close to the human emotion perception scores.

引用

页数：5

共 18 条

[1]

[Anonymous], INT C LEARNING REPRE

[2]

[Anonymous], 2015, TENSORFLOW LARGE SCA

[3]

Boersma P., 2020, Praat: doing phonetics by computer

[4] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[5] A tutorial survey of architectures, algorithms, and applications for deep learning [J].

Deng, Li .

APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2014, 3

[6]

Deng L, 2013, INT CONF ACOUST SPEE, P8604, DOI 10.1109/ICASSP.2013.6639345

[7]

Gulli A, 2017, Deep Learning with Keras

[8]

Han K, 2014, INTERSPEECH, P223

[9] Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music [J].

Han, Yoonchang ;

Kim, Jaehun ;

Lee, Kyogu .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) :208-221

[10]

Le D, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P216, DOI 10.1109/ASRU.2013.6707732

← 1 2 →