A joint hierarchical cross-attention graph convolutional network for multi-modal facial expression recognition

被引：1

作者：

Xu, Chujie ^{[1
]}

Du, Yong ^{[1
]}

Wang, Jingzi ^{[2
]}

Zheng, Wenjie ^{[1
]}

Li, Tiejun ^{[1
]}

Yuan, Zhansheng ^{[1
]}

机构：

[1] Jimei Univ, Sch Ocean Informat Engn, Xiamen, Peoples R China

[2] Natl Chengchi Univ, Dept Comp Sci, Chengchi, Taiwan

来源：

COMPUTATIONAL INTELLIGENCE | 2024年 / 40卷 / 01期

关键词：

cross-attention mechanism; emotional recognition in conversations; graph convolution network; IoT; multi-modal fusion; transformer; EMOTION RECOGNITION; VALENCE;

D O I：

10.1111/coin.12607

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotional recognition in conversations (ERC) is increasingly being applied in various IoT devices. Deep learning-based multimodal ERC has achieved great success by leveraging diverse and complementary modalities. Although most existing methods try to adopt attention mechanisms to fuse different information, these methods ignore the complementarity between modalities. To this end, the joint cross-attention model is introduced to alleviate this issue. However, multi-scale feature information on different modalities is not utilized. Moreover, the context relationship plays an important role in feature extraction in the expression recognition task. In this paper, we propose a novel joint hierarchical graph convolution network (JHGCN) which exploits different layer features and context relationships for facial expression recognition based on audio-visual (A-V) information. Specifically, we adopt different deep networks to extract features from different modalities individually. For V modality, we construct V graph data based on patch embeddings which are extracted from the transformer encoder. Moreover, we embed the graph convolution which can leverage the intra-modality relationships with the transformer encoder. Then, the deep feature from different layers is fed to the hierarchical fusion module to enhance feature representation. At last, we use the joint cross-attention mechanism to exploit the complementary inter-modality relationships. To validate the proposed model, we have conducted various experiments on the AffWild2 and CMU-MOSI datasets. All results confirm that our proposed model achieves highly promising performance compared to the joint cross-attention model and other methods.

引用

页数：18

共 50 条

[1] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
Rajan, Vandana
Brutti, Alessio
Cavallaro, Andrea
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697
[2] Multi-Modal Recurrent Attention Networks for Facial Expression Recognition
Lee, Jiyoung
Kim, Sunok
Kim, Seungryong
Sohn, Kwanghoon
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 6977 - 6991
[3] Multi-Level Multi-Modal Cross-Attention Network for Fake News Detection
Ying, Long
Yu, Hui
Wang, Jinguang
Ji, Yongze
Qian, Shengsheng
IEEE ACCESS, 2021, 9 : 132363 - 132373
[4] Multi-branch convolutional neural network with cross-attention mechanism for emotion recognition
Yan, Fei
Guo, Zekai
Iliyasu, Abdullah M.
Hirota, Kaoru
SCIENTIFIC REPORTS, 2025, 15 (01):
[5] Object Interaction Recommendation with Multi-Modal Attention-based Hierarchical Graph Neural Network
Zhang, Huijuan
Liang, Lipeng
Wang, Dongqing
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 295 - 305
[6] A novel signal channel attention network for multi-modal emotion recognition
Du, Ziang
Ye, Xia
Zhao, Pujie
FRONTIERS IN NEUROROBOTICS, 2024, 18
[7] Predicting transcription factor binding sites by a multi-modal representation learning method based on cross-attention network
Wei, Yuxiao
Zhang, Qi
Liu, Liwei
APPLIED SOFT COMPUTING, 2024, 166
[8] Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations
Zhang, Duzhen
Chen, Feilong
Chang, Jianlong
Chen, Xiuyi
Tian, Qi
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3987 - 3997
[9] Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition
Sun, Mingyi
Cui, Weigang
Zhang, Yue
Yu, Shuyue
Liao, Xiaofeng
Hu, Bin
Li, Yang
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (12) : 11823 - 11832
[10] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
Chen, Jianrong
Dey, Sujit
Wang, Lei
Bi, Ning
Liu, Peng
IEEE ACCESS, 2024, 12 : 137203 - 137221

← 1 2 3 4 5 →