Deep CNN with late fusion for real time multimodal emotion recognition

被引:4
|
作者
Dixit, Chhavi [1 ]
Satapathy, Shashank Mouli [2 ]
机构
[1] Shell India Markets Pvt Ltd, Bengaluru 560103, Karnataka, India
[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
关键词
CNN; Cross dataset; Ensemble learning; FastText; Multimodal emotion recognition; Stacking; SENTIMENT ANALYSIS; MODEL;
D O I
10.1016/j.eswa.2023.122579
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition is a fundamental aspect of human communication and plays a crucial role in various domains. This project aims at developing an efficient model for real-time multimodal emotion recognition in videos of human oration (opinion videos), where the speakers express their opinions about various topics. Four separate datasets contributing 20,000 samples for text, 1,440 for audio, 35,889 for images, and 3,879 videos for multimodal analysis respectively are used. One model is trained for each of the modalities: fastText for text analysis because of its efficiency, robustness to noise, and pre-trained embeddings; customized 1-D CNN for audio analysis using its translation invariance, hierarchical feature extraction, scalability, and generalization; custom 2-D CNN for image analysis because of its ability to capture local features and handle variations in image content. They are tested and combined on the CMU-MOSEI dataset using both bagging and stacking to find the most effective architecture. They are then used for real-time analysis of speeches. Each of the models is trained on 80% of the datasets, the remaining 20% is used for testing individual and combined accuracies in CMU-MOSEI. The emotions finally predicted by the architecture correspond to the six classes in the CMU-MOSEI dataset. This cross-dataset training and testing of the models makes them robust and efficient for general use, removes reliance on a specific domain or dataset, and adds more data points for model training. The proposed architecture was able to achieve an accuracy of 85.85% and an F1-score of 83 on the CMU-MOSEI dataset.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
  • [22] Real time emotion recognition in video stream, using B-CNN and F-CNN
    Guetari, Ramzi
    Chetouani, Aladine
    Tabia, Hedi
    Khlifa, Nawres
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [23] A Real-time Multimodal Intelligent Tutoring Emotion Recognition System (MITERS)
    Khediri, Nouha
    Ben Ammar, Mohamed
    Kherallah, Monji
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57759 - 57783
  • [24] Multimodal Attentive Learning for Real-time Explainable Emotion Recognition in Conversations
    Arumugam, Balaji
    Das Bhattacharjee, Sreyasee
    Yuan, Junsong
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1210 - 1214
  • [25] Semisupervised Deep Features of Time-Frequency Maps for Multimodal Emotion Recognition
    Zali-Vargahan, Behrooz
    Charmin, Asghar
    Kalbkhani, Hashem
    Barghandan, Saeed
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2023, 2023
  • [26] Emotion Recognition Based on Feedback Weighted Fusion of Multimodal Emotion Data
    Wei, Wei
    Jia, Qingxuan
    Feng, Yongli
    2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 1682 - 1687
  • [27] A Framework to Evaluate Fusion Methods for Multimodal Emotion Recognition
    Pena, Diego
    Aguilera, Ana
    Dongo, Irvin
    Heredia, Juanpablo
    Cardinale, Yudith
    IEEE ACCESS, 2023, 11 : 10218 - 10237
  • [28] Dual Memory Fusion for Multimodal Speech Emotion Recognition
    Priyasad, Darshana
    Fernando, Tharindu
    Sridharan, Sridha
    Denman, Simon
    Fookes, Clinton
    INTERSPEECH 2023, 2023, : 4543 - 4547
  • [29] Fusion of Facial Expressions and EEG for Multimodal Emotion Recognition
    Huang, Yongrui
    Yang, Jianhao
    Liao, Pengkai
    Pan, Jiahui
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2017, 2017
  • [30] Multimodal emotion recognition for the fusion of speech and EEG signals
    Ma J.
    Sun Y.
    Zhang X.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150