Multimodal Fusion with Cross-Modal Attention for Action Recognition in Still Images

被引:1
|
作者
Tsai, Jia-Hua [1 ]
Chu, Wei-Ta [1 ]
机构
[1] Natl Cheng Kung Univ, Tainan, Taiwan
来源
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022 | 2022年
关键词
action recognition; cross-modal attention; feature fusion;
D O I
10.1145/3551626.3564960
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We propose a cross-modal attention module to combine information from different cues and different modalities, to achieve action recognition in still images. Feature maps are extracted from the entire image, the detected human bounding box, and the detected human skeleton, respectively. Inspired by the transformer structure, we design the processing between the query vector from one cue/modality, and the key vector from another cue/modality. Feature maps from different cues/modalities are cross-referred so that better representations can be obtained to yield better performance. We show that the proposed framework outperforms the state-of-the-art systems without the requirement of an extra training dataset. We also conduct ablation studies to investigate how different settings impact the final results.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Cross-modal attention and letter recognition
    Wesner, Michael
    Miller, Lisa
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 343 - 343
  • [2] Joint low-rank tensor fusion and cross-modal attention for multimodal physiological signals based emotion recognition
    Wan, Xin
    Wang, Yongxiong
    Wang, Zhe
    Tang, Yiheng
    Liu, Benke
    PHYSIOLOGICAL MEASUREMENT, 2024, 45 (07)
  • [3] IMPLICIT ATTENTION-BASED CROSS-MODAL COLLABORATIVE LEARNING FOR ACTION RECOGNITION
    Zhang, Jianghao
    Zhong, Xian
    Liu, Wenxuan
    Jiang, Kui
    Yang, Zhengwei
    Wang, Zheng
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3020 - 3024
  • [4] Cross-Modal Learning with 3D Deformable Attention for Action Recognition
    Kim, Sangwon
    Ahn, Dasom
    Ko, Byoung Chul
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10231 - 10241
  • [5] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [6] Cmf-transformer: cross-modal fusion transformer for human action recognition
    Wang, Jun
    Xia, Limin
    Wen, Xin
    MACHINE VISION AND APPLICATIONS, 2024, 35 (05)
  • [7] Cross-modal contrastive learning for multimodal sentiment recognition
    Yang, Shanliang
    Cui, Lichao
    Wang, Lei
    Wang, Tao
    APPLIED INTELLIGENCE, 2024, 54 (05) : 4260 - 4276
  • [8] Cross-modal contrastive learning for multimodal sentiment recognition
    Shanliang Yang
    Lichao Cui
    Lei Wang
    Tao Wang
    Applied Intelligence, 2024, 54 : 4260 - 4276
  • [9] CROSS-MODAL KNOWLEDGE DISTILLATION FOR ACTION RECOGNITION
    Thoker, Fida Mohammad
    Gall, Juergen
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 6 - 10
  • [10] Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks
    Quan, Zhibang
    Sun, Tao
    Su, Mengli
    Wei, Jishu
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022