Causality-inspired representation learning for weakly supervised skeleton-based action recognition

被引:0
作者
Wang, Kun [1 ]
Cao, Jiuxin [1 ]
Ge, Jiawei [1 ]
Liu, Chang [1 ]
Liu, Bo [1 ]
机构
[1] Southeast Univ, Nanjing 210000, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Skeleton-based Action Recognition; Weakly supervised learning; Representation learning; WEARABLE SENSORS; NETWORK;
D O I
10.1016/j.knosys.2025.114042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition has become an important research topic in computer vision. Most existing methods tackle this task by mapping skeletal features to fine-grained labels. However, weakly labeled samples-containing both relevant and irrelevant human instances-inevitably emerge during data collection and annotation, adversely impacting model training and inference. To address this challenge, we utilize a structural causal model to formalize the Weakly Supervised Skeleton-based Action Recognition (WS-SAR) problem, treating relevant and irrelevant instances as causal and non-causal factors, respectively. Our primary goal is to learn representations of relevant human instances, i.e., causal factors, from weakly labeled data in WS-SAR. Based on this formulation, we propose a Causality-inspired Representation Learning (CiRL) framework, comprising the Causality DEtection TRansformer (C-DETR) and Supervised Contrastive Learning (Sup-CL). C-DETR leverages learned embeddings as class queries and employs class-matching along with causality-enhanced contrastive learning to extract causal factors from both sample-level and instance-level features. Subsequently, the Sup-CL training strategy applies supervised contrastive learning to effectively capture shared causal representations among weakly labeled samples within the same class. Experimental results show that our framework achieves state-of-the-art performance across multiple datasets, including WL-NTU, IT-NTU120, and SBU. The source code is available at https://github.com/KennCoder7/CiRL.
引用
收藏
页数:12
相关论文
共 64 条
[1]   Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition [J].
Banerjee, Avinandan ;
Singh, Pawan Kumar ;
Sarkar, Ram .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) :2206-2216
[2]  
Cai RC, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2060
[3]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[4]  
Chen L., 2024, IEEE Trans. Multimed., V27, P1114
[5]   Pyramid Spatial-Temporal Graph Transformer for Skeleton-Based Action Recognition [J].
Chen, Shuo ;
Xu, Ke ;
Jiang, Xinghao ;
Sun, Tanfeng .
APPLIED SCIENCES-BASEL, 2022, 12 (18)
[6]   Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].
Chen, Tailin ;
Zhou, Desen ;
Wang, Jian ;
Wang, Shidong ;
Guan, Yu ;
He, Xuming ;
Ding, Errui .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342
[7]   Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].
Chen, Yuxin ;
Zhang, Ziqi ;
Yuan, Chunfeng ;
Li, Bing ;
Deng, Ying ;
Hu, Weiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348
[8]  
Do J., 2024, EUR C COMP VIS, P401
[9]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[10]  
Duan H., 2022, P IEEECVF C COMPUTER, P2969, DOI [10.1109/CVPR52688.2022.00298, DOI 10.1109/CVPR52688.2022.00298]