Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer Architecture

被引:0
作者
Filali, Hajar [1 ,2 ]
Boulealam, Chafik [1 ]
El Fazazy, Khalid [1 ]
Mahraz, Adnane Mohamed [1 ]
Tairi, Hamid [1 ]
Riffi, Jamal [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Fac Sci Dhar El Mahraz, Dept Comp Sci, LISAC, Fes 30000, Morocco
[2] ISGA, Lab Innovat Management & Engn Enterprise LIMITE, Fes 30000, Morocco
关键词
emotion recognition; deep learning; graph convolutional network; capsule network; vision transformer; meaningful neural network (MNN); multimodal architecture;
D O I
10.3390/info16010040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of emotionally intelligent computers depends on emotion recognition based on richer multimodal inputs, such as text, speech, and visual cues, as multiple modalities complement one another. The effectiveness of complex relationships between modalities for emotion recognition has been demonstrated, but these relationships are still largely unexplored. Various fusion mechanisms using simply concatenated information have been the mainstay of previous research in learning multimodal representations for emotion classification, rather than fully utilizing the benefits of deep learning. In this paper, a unique deep multimodal emotion model is proposed, which uses the meaningful neural network to learn meaningful multimodal representations while classifying data. Specifically, the proposed model concatenates multimodality inputs using a graph convolutional network to extract acoustic modality, a capsule network to generate the textual modality, and vision transformer to acquire the visual modality. Despite the effectiveness of MNN, we have used it as a methodological innovation that will be fed with the previously generated vector parameters to produce better predictive results. Our suggested approach for more accurate multimodal emotion recognition has been shown through extensive examinations, producing state-of-the-art results with accuracies of 69% and 56% on two public datasets, MELD and MOSEI, respectively.
引用
收藏
页数:22
相关论文
共 45 条
  • [1] Emotions Recognition Using EEG Signals: A Survey
    Alarcao, Soraia M.
    Fonseca, Manuel J.
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (03) : 374 - 393
  • [2] Facial Emotion Recognition Using Hybrid Features
    Alreshidi, Abdulrahman
    Ullah, Mohib
    [J]. INFORMATICS-BASEL, 2020, 7 (01):
  • [3] An Emotion Recognition Method Based on Eye Movement and Audiovisual Features in MOOC Learning Environment
    Bao, Jindi
    Tao, Xiaomei
    Zhou, Yinghui
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) : 171 - 183
  • [4] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [5] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [6] Chen C, 2019, AAAI CONF ARTIF INTE, P485
  • [7] Modeling Hierarchical Uncertainty for Multimodal Emotion Recognition in Conversation
    Chen, Feiyu
    Shao, Jie
    Zhu, Anjie
    Ouyang, Deqiang
    Liu, Xueliang
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 187 - 198
  • [8] Chen S.Y., 2018, arXiv
  • [9] Emotion Recognition for Healthcare Surveillance Systems Using Neural Networks: A Survey
    Dhuheir, Marwan
    Albaseer, Abdullatif
    Baccour, Emna
    Erbad, Aiman
    Abdallah, Mohamed
    Hamdi, Mounir
    [J]. IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 681 - 687
  • [10] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929