GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition

被引:22
|
作者
Li, Jiang [1 ,2 ]
Wang, Xiaoping [1 ,2 ]
Lv, Guoqing [1 ,2 ]
Zeng, Zhigang [1 ,2 ]
机构
[1] Huazhong Univ Sci & Technol, Educ Minist China, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Educ Minist China, Key Lab Image Proc & Intelligent Control, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition in conversation; multimodal fusion; graph neural networks; cross-modal feature complementation;
D O I
10.1109/TMM.2023.3260635
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [31] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
  • [32] Research on cross-modal emotion recognition based on multi-layer semantic fusion
    Xu Z.
    Gao Y.
    Mathematical Biosciences and Engineering, 2024, 21 (02) : 2488 - 2514
  • [33] EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals
    Liu, Yucheng
    Jia, Ziyu
    Wang, Haichao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6122 - 6131
  • [34] Hierarchical discriminant feature learning for cross-modal face recognition
    Xu, Xiaolin
    Li, Yidong
    Jin, Yi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33483 - 33502
  • [35] Hierarchical discriminant feature learning for cross-modal face recognition
    Xiaolin Xu
    Yidong Li
    Yi Jin
    Multimedia Tools and Applications, 2020, 79 : 33483 - 33502
  • [36] MMDAG: Multimodal Directed Acyclic Graph Network for Emotion Recognition in Conversation
    Xu, Shuo
    Jia, Yuxiang
    Niu, Changyong
    Zan, Hongying
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6802 - 6807
  • [37] A graph neural network with context filtering and feature correction for conversational emotion recognition
    Gan, Chenquan
    Zheng, Jiahao
    Zhu, Qingyi
    Jain, Deepak Kumar
    Štruc, Vitomir
    Information Sciences, 2024, 658
  • [38] A graph neural network with context filtering and feature correction for conversational emotion recognition
    Gan, Chenquan
    Zheng, Jiahao
    Zhu, Qingyi
    Kumar, Deepak
    Struc, Vitomir
    INFORMATION SCIENCES, 2024, 658
  • [39] Assisting Multimodal Named Entity Recognition by cross-modal auxiliary tasks
    Chen, Zhengjie
    Zhang, Yu
    Mi, Siya
    PATTERN RECOGNITION LETTERS, 2023, 175 : 52 - 58
  • [40] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388