MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

被引:0
|
作者
Madan, Surbhi [1 ]
Jain, Rishabh [1 ]
Sharma, Gulshan [1 ]
Subramanian, Ramanathan [2 ]
Dhall, Abhinav [1 ,3 ]
机构
[1] Indian Inst Technol Ropar, Rupnagar, Punjab, India
[2] Univ Canberra, Canberra, ACT, Australia
[3] Monash Univ, Clayton, Vic, Australia
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
Bodily Behavior; Multiview Attention; DCT; Transformer;
D O I
10.1145/3581783.3612858
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC- TBR
引用
收藏
页码:9526 / 9530
页数:5
相关论文
共 19 条
  • [1] Transformer-based multiview spatiotemporal feature interactive fusion for human action recognition in depth videos
    Wu, Hanbo
    Ma, Xin
    Li, Yibin
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2025, 131
  • [2] Transformer-Based Multimodal Spatial-Temporal Fusion for Gait Recognition
    Zhang, Jikai
    Ji, Mengyu
    He, Yihao
    Guo, Dongliang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XV, 2025, 15045 : 494 - 507
  • [3] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
    Siriwardhana, Shamane
    Kaluarachchi, Tharindu
    Billinghurst, Mark
    Nanayakkara, Suranga
    IEEE ACCESS, 2020, 8 (08): : 176274 - 176285
  • [4] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
    Liu, Peng
    Zhang, Zonghua
    Meng, Zhaozong
    Gao, Nan
    NEUROCOMPUTING, 2025, 620
  • [5] A Transformer-Based Unsupervised Domain Adaptation Method for Skeleton Behavior Recognition
    Yan, Qiuyan
    Hu, Yan
    IEEE ACCESS, 2023, 11 : 51689 - 51700
  • [6] Cnnformer: Transformer-Based Semantic Information Enhancement Framework for Behavior Recognition
    Liu, Jindong
    Xiao, Zidong
    Bai, Yan
    Xie, Fei
    Wu, Wei
    Zhu, Wenjuan
    He, Hua
    IEEE ACCESS, 2023, 11 : 141299 - 141308
  • [7] Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification
    Zhang, Maqun
    Gao, Feng
    Zhang, Tiange
    Gan, Yanhai
    Dong, Junyu
    Yu, Hui
    REMOTE SENSING, 2023, 15 (03)
  • [8] TSMCF: Transformer-Based SAR and Multispectral Cross-Attention Fusion for Cloud Removal
    Zhu, Hongming
    Wang, Zeju
    Han, Letong
    Xu, Manxin
    Li, Weiqi
    Liu, Qin
    Liu, Sicong
    Du, Bowen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6710 - 6720
  • [9] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [10] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
    Miao, Haoran
    Cheng, Gaofeng
    Gao, Changfeng
    Zhang, Pengyuan
    Yan, Yonghong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088