Deep CNN with late fusion for real time multimodal emotion recognition

被引：4

作者：

Dixit, Chhavi ^{[1
]}

Satapathy, Shashank Mouli ^{[2
]}

机构：

[1] Shell India Markets Pvt Ltd, Bengaluru 560103, Karnataka, India

[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 240卷

关键词：

CNN; Cross dataset; Ensemble learning; FastText; Multimodal emotion recognition; Stacking; SENTIMENT ANALYSIS; MODEL;

D O I：

10.1016/j.eswa.2023.122579

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotion recognition is a fundamental aspect of human communication and plays a crucial role in various domains. This project aims at developing an efficient model for real-time multimodal emotion recognition in videos of human oration (opinion videos), where the speakers express their opinions about various topics. Four separate datasets contributing 20,000 samples for text, 1,440 for audio, 35,889 for images, and 3,879 videos for multimodal analysis respectively are used. One model is trained for each of the modalities: fastText for text analysis because of its efficiency, robustness to noise, and pre-trained embeddings; customized 1-D CNN for audio analysis using its translation invariance, hierarchical feature extraction, scalability, and generalization; custom 2-D CNN for image analysis because of its ability to capture local features and handle variations in image content. They are tested and combined on the CMU-MOSEI dataset using both bagging and stacking to find the most effective architecture. They are then used for real-time analysis of speeches. Each of the models is trained on 80% of the datasets, the remaining 20% is used for testing individual and combined accuracies in CMU-MOSEI. The emotions finally predicted by the architecture correspond to the six classes in the CMU-MOSEI dataset. This cross-dataset training and testing of the models makes them robust and efficient for general use, removes reliance on a specific domain or dataset, and adds more data points for model training. The proposed architecture was able to achieve an accuracy of 85.85% and an F1-score of 83 on the CMU-MOSEI dataset.

引用

页数：15

共 50 条

[21] Emotion Recognition on Multimodal with Deep Learning and Ensemble
Dharma, David Adi
Zahra, Amalia
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
[22] Real time emotion recognition in video stream, using B-CNN and F-CNN
Guetari, Ramzi
Chetouani, Aladine
Tabia, Hedi
Khlifa, Nawres
2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
[23] A Real-time Multimodal Intelligent Tutoring Emotion Recognition System (MITERS)
Khediri, Nouha
Ben Ammar, Mohamed
Kherallah, Monji
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57759 - 57783
[24] Multimodal Attentive Learning for Real-time Explainable Emotion Recognition in Conversations
Arumugam, Balaji
Das Bhattacharjee, Sreyasee
Yuan, Junsong
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1210 - 1214
[25] Semisupervised Deep Features of Time-Frequency Maps for Multimodal Emotion Recognition
Zali-Vargahan, Behrooz
Charmin, Asghar
Kalbkhani, Hashem
Barghandan, Saeed
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2023, 2023
[26] Emotion Recognition Based on Feedback Weighted Fusion of Multimodal Emotion Data
Wei, Wei
Jia, Qingxuan
Feng, Yongli
2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 1682 - 1687
[27] A Framework to Evaluate Fusion Methods for Multimodal Emotion Recognition
Pena, Diego
Aguilera, Ana
Dongo, Irvin
Heredia, Juanpablo
Cardinale, Yudith
IEEE ACCESS, 2023, 11 : 10218 - 10237
[28] Dual Memory Fusion for Multimodal Speech Emotion Recognition
Priyasad, Darshana
Fernando, Tharindu
Sridharan, Sridha
Denman, Simon
Fookes, Clinton
INTERSPEECH 2023, 2023, : 4543 - 4547
[29] Fusion of Facial Expressions and EEG for Multimodal Emotion Recognition
Huang, Yongrui
Yang, Jianhao
Liao, Pengkai
Pan, Jiahui
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2017, 2017
[30] Multimodal emotion recognition for the fusion of speech and EEG signals
Ma J.
Sun Y.
Zhang X.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150

← 1 2 3 4 5 →