Multimodal Emotion Recognition via Convolutional Neural Networks: Comparison of different strategies on two multimodal datasets

被引:10
|
作者
Bilotti, U. [1 ]
Bisogni, C. [1 ]
De Marsico, M. [2 ]
Tramonte, S.
机构
[1] Univ Salerno, Via Giovanni Paolo II,132, I-84084 Fisciano, Italy
[2] Sapienza Univ Rome, Via Salaria 113, I-00198 Rome, Italy
关键词
Emotion recognition; Multimodal emotion recognition; Multi-input model; Biometrics; Deep learning; CIRCUMPLEX MODEL; FEATURES;
D O I
10.1016/j.engappai.2023.107708
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The aim of this paper is to investigate emotion recognition using a multimodal approach that exploits convolutional neural networks (CNNs) with multiple input. Multimodal approaches allow different modalities to cooperate in order to achieve generally better performances because different features are extracted from different pieces of information. In this work, the facial frames, the optical flow computed from consecutive facial frames, and the Mel Spectrograms (from the word melody) are extracted from videos and combined together in different ways to understand which modality combination works better. Several experiments are run on the models by first considering one modality at a time so that good accuracy results are found on each modality. Afterward, the models are concatenated to create a final model that allows multiple inputs. For the experiments the datasets used are BAUM-1 ((Bahcesehir University Multimodal Affective Database -1) and RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song), which both collect two distinguished sets of videos based on the different intensity of the expression, that is acted/strong or spontaneous/normal, providing the representations of the following emotional states that will be taken into consideration: angry, disgust, fearful, happy and sad. The performances of the proposed models are shown through accuracy results and some confusion matrices, demonstrating better accuracy than the compared proposals in the literature. The best accuracy achieved on BAUM-1 dataset is about 95%, while on RAVDESS it is about 95.5%.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network
    Zhang, Yong
    Cheng, Cheng
    Zhang, Yidie
    IEEE ACCESS, 2021, 9 : 7943 - 7951
  • [2] Enhancing Emotion Recognition through Federated Learning: A Multimodal Approach with Convolutional Neural Networks
    Simic, Nikola
    Suzic, Sinisa
    Milosevic, Nemanja
    Stanojev, Vuk
    Nosek, Tijana
    Popovic, Branislav
    Bajovic, Dragana
    APPLIED SCIENCES-BASEL, 2024, 14 (04):
  • [3] Multimodal Emotion Recognition Based on Ensemble Convolutional Neural Network
    Huang, Haiping
    Hu, Zhenchao
    Wang, Wenming
    Wu, Min
    IEEE ACCESS, 2020, 8 : 3265 - 3271
  • [4] End-to-End Multimodal Emotion Recognition Using Deep Neural Networks
    Tzirakis, Panagiotis
    Trigeorgis, George
    Nicolaou, Mihalis A.
    Schuller, Bjorn W.
    Zafeiriou, Stefanos
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1301 - 1309
  • [5] An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition
    Aguilera, Ana
    Mellado, Diego
    Rojas, Felipe
    SENSORS, 2023, 23 (11)
  • [6] Multimodal Emotion Recognition Using Deep Neural Networks
    Tang, Hao
    Liu, Wei
    Zheng, Wei-Long
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 811 - 819
  • [7] A novel spatio-temporal convolutional neural framework for multimodal emotion recognition
    Sharafi, Masoumeh
    Yazdchi, Mohammadreza
    Rasti, Reza
    Nasimi, Fahimeh
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 78
  • [8] Emotion recognition using heterogeneous convolutional neural networks combined with multimodal factorized bilinear pooling
    Zhang, Yong
    Cheng, Cheng
    Wang, Shuai
    Xia, Tianqi
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 77
  • [9] Multimodal Emotion Recognition Using Compressed Graph Neural Networks
    Durkic, Tijana
    Simic, Nikola
    Bajovic, Sinisa Suzie Dragana
    Peric, Zoran
    Delic, Vladan
    SPEECH AND COMPUTER, SPECOM 2024, PT II, 2025, 15300 : 109 - 121
  • [10] Multimodal Affect Recognition Using Temporal Convolutional Neural Networks
    Ayoub, Issa
    Heiries, Vincent
    Al Osman, Hussein
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,