GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition

被引:22
|
作者
Li, Jiang [1 ,2 ]
Wang, Xiaoping [1 ,2 ]
Lv, Guoqing [1 ,2 ]
Zeng, Zhigang [1 ,2 ]
机构
[1] Huazhong Univ Sci & Technol, Educ Minist China, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Educ Minist China, Key Lab Image Proc & Intelligent Control, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition in conversation; multimodal fusion; graph neural networks; cross-modal feature complementation;
D O I
10.1109/TMM.2023.3260635
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [1] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [2] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
    Khan, Mustaqeem
    Tran, Phuong-Nam
    Pham, Nhat Truong
    El Saddik, Abdulmotaleb
    Othmani, Alice
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [3] Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition
    Hong, Soyeon
    Kang, Hyeoungguk
    Cho, Hyunsouk
    IEEE ACCESS, 2024, 12 : 14324 - 14333
  • [4] Cross-modal credibility modelling for EEG-based multimodal emotion recognition
    Zhang, Yuzhe
    Liu, Huan
    Wang, Di
    Zhang, Dalin
    Lou, Tianyu
    Zheng, Qinghua
    Quek, Chai
    JOURNAL OF NEURAL ENGINEERING, 2024, 21 (02)
  • [5] Multichannel Multimodal Emotion Analysis of Cross-Modal Feedback Interactions Based on Knowledge Graph
    Dong, Shaohua
    Fan, Xiaochao
    Ma, Xinchun
    NEURAL PROCESSING LETTERS, 2024, 56 (03)
  • [6] A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition
    Zhang, Xiaoheng
    Cui, Weigang
    Hu, Bin
    Li, Yang
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1553 - 1566
  • [7] Speaker-aware Cross-modal Fusion Architecture for Conversational Emotion Recognition
    Zhao, Huan
    Li, Bo
    Zhang, Zixing
    INTERSPEECH 2023, 2023, : 2718 - 2722
  • [8] Attentive Cross-modal Connections for Deep Multimodal Wearable-based Emotion Recognition
    Bhatti, Anubhav
    Behinaein, Behnam
    Rodenburg, Dirk
    Hungler, Paul
    Etemad, Ali
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
  • [9] A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognition
    Li, Chiqin
    Xie, Lun
    Shao, Xingmao
    Pan, Hang
    Wang, Zhiliang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [10] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153