Emotion recognition using deep learning approach from audio-visual emotional big data

被引:272
作者
Hossain, M. Shamim [1 ]
Muhammad, Ghulam [2 ]
机构
[1] King Saud Univ, Dept Software Engn, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
[2] King Saud Univ, Dept Comp Engn, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
HEALTH-CARE; SPEECH; MACHINE; NETWORK; SYSTEM;
D O I
10.1016/j.inffus.2018.09.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an emotion recognition system using a deep learning approach from emotional Big Data. The Big Data comprises of speech and video. In the proposed system, a speech signal is first processed in the frequency domain to obtain a Mel-spectrogram, which can be treated as an image. Then this Mel-spectrogram is fed to a convolutional neural network (CNN). For video signals, some representative frames from a video segment are extracted and fed to the CNN. The outputs of the two CNNs are fused using two consecutive extreme learning machines (ELMs). The output of the fusion is given to a support vector machine (SVM) for final classification of the emotions. The proposed system is evaluated using two audio-visual emotional databases, one of which is Big Data. Experimental results confirm the effectiveness of the proposed system involving the CNNs and the ELMs.
引用
收藏
页码:69 / 78
页数:10
相关论文
共 61 条
[1]   New approach in quantification of emotional intensity from the speech signal: emotional temperature [J].
Alonso, Jesus B. ;
Cabrera, Josue ;
Medina, Manuel ;
Travieso, Carlos M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (24) :9554-9564
[2]  
[Anonymous], 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV)
[3]  
[Anonymous], 2017, ARXIV170709917
[4]  
[Anonymous], 2015, The Oxford Handbook of Affective Computing
[5]  
[Anonymous], 2000, P 4 IEEE INT C AUTOM, DOI [10.1109/AFGR.2000.840611, DOI 10.1109/AFGR.2000.840611]
[6]   Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks [J].
Bejani, Mahdi ;
Gharavian, Davood ;
Charkari, Nasrollah Moghaddam .
NEURAL COMPUTING & APPLICATIONS, 2014, 24 (02) :399-412
[7]  
Burkhardt F., 2005, P INTERSPEECH LISB P
[8]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[9]   Cognitive Computing: Architecture,Technologies and Intelligent Applications [J].
Chen, Min ;
Herrera, Francisco ;
Hwang, Kai .
IEEE ACCESS, 2018, 6 :19774-19783
[10]   SPHA: Smart Personal Health Advisor Based on Deep Analytics [J].
Chen, Min ;
Zhang, Yin ;
Qiu, Meikang ;
Guizani, Nadra ;
Hao, Yixue .
IEEE COMMUNICATIONS MAGAZINE, 2018, 56 (03) :164-169