Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG

被引：23

作者：

Pan, Jiahui ^{[1
]}

Fang, Weijie ^{[1
]}

Zhang, Zhihang ^{[1
]}

Chen, Bingzhi ^{[1
]}

Zhang, Zheng ^{[2
]}

Wang, Shuihua ^{[3
]}

机构：

[1] South China Normal Univ, Sch Software, Guangzhou 510631, Peoples R China

[2] Harbin Inst Technol, Shenzhen Med Biometr Percept & Anal Engn Lab, Shenzhen 518055, Peoples R China

[3] Univ Leicester, Sch Comp & Math Sci, Leicester LE1 7RH, England

来源：

IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY | 2024年 / 5卷

关键词：

Emotion recognition; Brain modeling; Feature extraction; Electroencephalography; Speech recognition; Convolution; Deep learning; Multimodal emotion recognition; electroencephalogram; facial expressions; speech; NETWORKS; FUSION;

D O I：

10.1109/OJEMB.2023.3240280

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Goal: As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotions using different modalities remains challenging. 2) Due to the increasing amount of computing power required for deep learning, how to provide real-time detection and improve the robustness of deep neural networks is important. Method: In this paper, we propose a deep learning-based multimodal emotion recognition (MER) called Deep-Emotion, which can adaptively integrate the most discriminating features from facial expressions, speech, and electroencephalogram (EEG) to improve the performance of the MER. Specifically, the proposed Deep-Emotion framework consists of three branches, i.e., the facial branch, speech branch, and EEG branch. Correspondingly, the facial branch uses the improved GhostNet neural network proposed in this paper for feature extraction, which effectively alleviates the overfitting phenomenon in the training process and improves the classification accuracy compared with the original GhostNet network. For work on the speech branch, this paper proposes a lightweight fully convolutional neural network (LFCNN) for the efficient extraction of speech emotion features. Regarding the study of EEG branches, we proposed a tree-like LSTM (tLSTM) model capable of fusing multi-stage features for EEG emotion feature extraction. Finally, we adopted the strategy of decision-level fusion to integrate the recognition results of the above three modes, resulting in more comprehensive and accurate performance. Result and Conclusions: Extensive experiments on the CK+, EMO-DB, and MAHNOB-HCI datasets have demonstrated the advanced nature of the Deep-Emotion method proposed in this paper, as well as the feasibility and superiority of the MER approach.

引用

页码：396 / 403

页数：8

共 35 条

[1]

Abdullah S.M.S.A., 2021, Journal of Applied Science and Technology Trends, V2, P52, DOI DOI 10.38094/JASTT20291

[2] LIGHT-SERNET: A LIGHTWEIGHT FULLY CONVOLUTIONAL NEURAL NETWORK FOR SPEECH EMOTION RECOGNITION [J].

Aftab, Arya ;

Morsali, Alireza ;

Ghaemmaghami, Shahrokh ;

Champagne, Benoit .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6912-6916

[3] Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files [J].

Andayani, Felicia ;

Theng, Lau Bee ;

Tsun, Mark Teekit ;

Chua, Caslon .

IEEE ACCESS, 2022, 10 :36018-36027

[4] Multimodal Emotion Recognition With Temporal and Semantic Consistency [J].

Chen, Bingzhi ;

Cao, Qi ;

Hou, Mixiao ;

Zhang, Zheng ;

Lu, Guangming ;

Zhang, David .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3592-3603

[5] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition [J].

Chen, Mingyi ;

He, Xuanji ;

Yang, Jing ;

Zhang, Han .

IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) :1440-1444

[6] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[7] Deep learning-based facial emotion recognition for human-computer interaction applications [J].

Chowdary, M. Kalpana ;

Nguyen, Tu N. ;

Hemanth, D. Jude .

NEURAL COMPUTING & APPLICATIONS, 2023, 35 (32) :23311-23328

[8] Hierarchical fusion of visual and physiological signals for emotion recognition [J].

Fang, Yuchun ;

Rong, Ruru ;

Huang, Jun .

MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2021, 32 (04) :1103-1121

[9] GhostNet: More Features from Cheap Operations [J].

Han, Kai ;

Wang, Yunhe ;

Tian, Qi ;

Guo, Jianyuan ;

Xu, Chunjing ;

Xu, Chang .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1577-1586

[10] Advances in Multimodal Emotion Recognition Based on Brain-Computer Interfaces [J].

He, Zhipeng ;

Li, Zina ;

Yang, Fuzhou ;

Wang, Lei ;

Li, Jingcong ;

Zhou, Chengju ;

Pan, Jiahui .

BRAIN SCIENCES, 2020, 10 (10) :1-29

← 1 2 3 4 →