MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

被引：0

作者：

Madan, Surbhi ^{[1
]}

Jain, Rishabh ^{[1
]}

Sharma, Gulshan ^{[1
]}

Subramanian, Ramanathan ^{[2
]}

Dhall, Abhinav ^{[1
,3
]}

机构：

[1] Indian Inst Technol Ropar, Rupnagar, Punjab, India

[2] Univ Canberra, Canberra, ACT, Australia

[3] Monash Univ, Clayton, Vic, Australia

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

Bodily Behavior; Multiview Attention; DCT; Transformer;

D O I：

10.1145/3581783.3612858

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC- TBR

引用

页码：9526 / 9530

页数：5

共 19 条

[1] Transformer-based multiview spatiotemporal feature interactive fusion for human action recognition in depth videos
Wu, Hanbo
Ma, Xin
Li, Yibin
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2025, 131
[2] Transformer-Based Multimodal Spatial-Temporal Fusion for Gait Recognition
Zhang, Jikai
Ji, Mengyu
He, Yihao
Guo, Dongliang
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XV, 2025, 15045 : 494 - 507
[3] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
Siriwardhana, Shamane
Kaluarachchi, Tharindu
Billinghurst, Mark
Nanayakkara, Suranga
IEEE ACCESS, 2020, 8 (08): : 176274 - 176285
[4] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
Liu, Peng
Zhang, Zonghua
Meng, Zhaozong
Gao, Nan
NEUROCOMPUTING, 2025, 620
[5] A Transformer-Based Unsupervised Domain Adaptation Method for Skeleton Behavior Recognition
Yan, Qiuyan
Hu, Yan
IEEE ACCESS, 2023, 11 : 51689 - 51700
[6] Cnnformer: Transformer-Based Semantic Information Enhancement Framework for Behavior Recognition
Liu, Jindong
Xiao, Zidong
Bai, Yan
Xie, Fei
Wu, Wei
Zhu, Wenjuan
He, Hua
IEEE ACCESS, 2023, 11 : 141299 - 141308
[7] Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification
Zhang, Maqun
Gao, Feng
Zhang, Tiange
Gan, Yanhai
Dong, Junyu
Yu, Hui
REMOTE SENSING, 2023, 15 (03)
[8] TSMCF: Transformer-Based SAR and Multispectral Cross-Attention Fusion for Cloud Removal
Zhu, Hongming
Wang, Zeju
Han, Letong
Xu, Manxin
Li, Weiqi
Liu, Qin
Liu, Sicong
Du, Bowen
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6710 - 6720
[9] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
Xu, Menglong
Li, Shengqiang
Zhang, Xiao-Lei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
[10] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
Miao, Haoran
Cheng, Gaofeng
Gao, Changfeng
Zhang, Pengyuan
Yan, Yonghong
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088

← 1 2 →