A joint hierarchical cross-attention graph convolutional network for multi-modal facial expression recognition

被引:1
|
作者
Xu, Chujie [1 ]
Du, Yong [1 ]
Wang, Jingzi [2 ]
Zheng, Wenjie [1 ]
Li, Tiejun [1 ]
Yuan, Zhansheng [1 ]
机构
[1] Jimei Univ, Sch Ocean Informat Engn, Xiamen, Peoples R China
[2] Natl Chengchi Univ, Dept Comp Sci, Chengchi, Taiwan
关键词
cross-attention mechanism; emotional recognition in conversations; graph convolution network; IoT; multi-modal fusion; transformer; EMOTION RECOGNITION; VALENCE;
D O I
10.1111/coin.12607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotional recognition in conversations (ERC) is increasingly being applied in various IoT devices. Deep learning-based multimodal ERC has achieved great success by leveraging diverse and complementary modalities. Although most existing methods try to adopt attention mechanisms to fuse different information, these methods ignore the complementarity between modalities. To this end, the joint cross-attention model is introduced to alleviate this issue. However, multi-scale feature information on different modalities is not utilized. Moreover, the context relationship plays an important role in feature extraction in the expression recognition task. In this paper, we propose a novel joint hierarchical graph convolution network (JHGCN) which exploits different layer features and context relationships for facial expression recognition based on audio-visual (A-V) information. Specifically, we adopt different deep networks to extract features from different modalities individually. For V modality, we construct V graph data based on patch embeddings which are extracted from the transformer encoder. Moreover, we embed the graph convolution which can leverage the intra-modality relationships with the transformer encoder. Then, the deep feature from different layers is fed to the hierarchical fusion module to enhance feature representation. At last, we use the joint cross-attention mechanism to exploit the complementary inter-modality relationships. To validate the proposed model, we have conducted various experiments on the AffWild2 and CMU-MOSI datasets. All results confirm that our proposed model achieves highly promising performance compared to the joint cross-attention model and other methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
    Rajan, Vandana
    Brutti, Alessio
    Cavallaro, Andrea
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697
  • [2] Multi-Modal Recurrent Attention Networks for Facial Expression Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 6977 - 6991
  • [3] Multi-Level Multi-Modal Cross-Attention Network for Fake News Detection
    Ying, Long
    Yu, Hui
    Wang, Jinguang
    Ji, Yongze
    Qian, Shengsheng
    IEEE ACCESS, 2021, 9 : 132363 - 132373
  • [4] Multi-branch convolutional neural network with cross-attention mechanism for emotion recognition
    Yan, Fei
    Guo, Zekai
    Iliyasu, Abdullah M.
    Hirota, Kaoru
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [5] Object Interaction Recommendation with Multi-Modal Attention-based Hierarchical Graph Neural Network
    Zhang, Huijuan
    Liang, Lipeng
    Wang, Dongqing
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 295 - 305
  • [6] A novel signal channel attention network for multi-modal emotion recognition
    Du, Ziang
    Ye, Xia
    Zhao, Pujie
    FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [7] Predicting transcription factor binding sites by a multi-modal representation learning method based on cross-attention network
    Wei, Yuxiao
    Zhang, Qi
    Liu, Liwei
    APPLIED SOFT COMPUTING, 2024, 166
  • [8] Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations
    Zhang, Duzhen
    Chen, Feilong
    Chang, Jianlong
    Chen, Xiuyi
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3987 - 3997
  • [9] Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition
    Sun, Mingyi
    Cui, Weigang
    Zhang, Yue
    Yu, Shuyue
    Liao, Xiaofeng
    Hu, Bin
    Li, Yang
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (12) : 11823 - 11832
  • [10] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
    Chen, Jianrong
    Dey, Sujit
    Wang, Lei
    Bi, Ning
    Liu, Peng
    IEEE ACCESS, 2024, 12 : 137203 - 137221