Multimodal Fusion with Cross-Modal Attention for Action Recognition in Still Images

被引：1

作者：

Tsai, Jia-Hua ^{[1
]}

Chu, Wei-Ta ^{[1
]}

机构：

[1] Natl Cheng Kung Univ, Tainan, Taiwan

来源：

PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022 | 2022年

关键词：

action recognition; cross-modal attention; feature fusion;

D O I：

10.1145/3551626.3564960

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We propose a cross-modal attention module to combine information from different cues and different modalities, to achieve action recognition in still images. Feature maps are extracted from the entire image, the detected human bounding box, and the detected human skeleton, respectively. Inspired by the transformer structure, we design the processing between the query vector from one cue/modality, and the key vector from another cue/modality. Feature maps from different cues/modalities are cross-referred so that better representations can be obtained to yield better performance. We show that the proposed framework outperforms the state-of-the-art systems without the requirement of an extra training dataset. We also conduct ablation studies to investigate how different settings impact the final results.

引用

页数：5

共 50 条

[1] Cross-modal attention and letter recognition
Wesner, Michael
Miller, Lisa
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 343 - 343
[2] Joint low-rank tensor fusion and cross-modal attention for multimodal physiological signals based emotion recognition
Wan, Xin
Wang, Yongxiong
Wang, Zhe
Tang, Yiheng
Liu, Benke
PHYSIOLOGICAL MEASUREMENT, 2024, 45 (07)
[3] IMPLICIT ATTENTION-BASED CROSS-MODAL COLLABORATIVE LEARNING FOR ACTION RECOGNITION
Zhang, Jianghao
Zhong, Xian
Liu, Wenxuan
Jiang, Kui
Yang, Zhengwei
Wang, Zheng
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3020 - 3024
[4] Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Kim, Sangwon
Ahn, Dasom
Ko, Byoung Chul
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10231 - 10241
[5] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
Cao Xiaopeng
Zhang Linying
Chen Qiuxian
Ning Hailong
Dong Yizhuo
The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
[6] Cmf-transformer: cross-modal fusion transformer for human action recognition
Wang, Jun
Xia, Limin
Wen, Xin
MACHINE VISION AND APPLICATIONS, 2024, 35 (05)
[7] Cross-modal contrastive learning for multimodal sentiment recognition
Yang, Shanliang
Cui, Lichao
Wang, Lei
Wang, Tao
APPLIED INTELLIGENCE, 2024, 54 (05) : 4260 - 4276
[8] Cross-modal contrastive learning for multimodal sentiment recognition
Shanliang Yang
Lichao Cui
Lei Wang
Tao Wang
Applied Intelligence, 2024, 54 : 4260 - 4276
[9] CROSS-MODAL KNOWLEDGE DISTILLATION FOR ACTION RECOGNITION
Thoker, Fida Mohammad
Gall, Juergen
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 6 - 10
[10] Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks
Quan, Zhibang
Sun, Tao
Su, Mengli
Wei, Jishu
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022

← 1 2 3 4 5 →