Causality-inspired representation learning for weakly supervised skeleton-based action recognition

被引：0

作者：

Wang, Kun ^{[1
]}

Cao, Jiuxin ^{[1
]}

Ge, Jiawei ^{[1
]}

Liu, Chang ^{[1
]}

Liu, Bo ^{[1
]}

机构：

[1] Southeast Univ, Nanjing 210000, Jiangsu, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 326卷

基金：

中国国家自然科学基金;

关键词：

Skeleton-based Action Recognition; Weakly supervised learning; Representation learning; WEARABLE SENSORS; NETWORK;

D O I：

10.1016/j.knosys.2025.114042

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Skeleton-based action recognition has become an important research topic in computer vision. Most existing methods tackle this task by mapping skeletal features to fine-grained labels. However, weakly labeled samples-containing both relevant and irrelevant human instances-inevitably emerge during data collection and annotation, adversely impacting model training and inference. To address this challenge, we utilize a structural causal model to formalize the Weakly Supervised Skeleton-based Action Recognition (WS-SAR) problem, treating relevant and irrelevant instances as causal and non-causal factors, respectively. Our primary goal is to learn representations of relevant human instances, i.e., causal factors, from weakly labeled data in WS-SAR. Based on this formulation, we propose a Causality-inspired Representation Learning (CiRL) framework, comprising the Causality DEtection TRansformer (C-DETR) and Supervised Contrastive Learning (Sup-CL). C-DETR leverages learned embeddings as class queries and employs class-matching along with causality-enhanced contrastive learning to extract causal factors from both sample-level and instance-level features. Subsequently, the Sup-CL training strategy applies supervised contrastive learning to effectively capture shared causal representations among weakly labeled samples within the same class. Experimental results show that our framework achieves state-of-the-art performance across multiple datasets, including WL-NTU, IT-NTU120, and SBU. The source code is available at https://github.com/KennCoder7/CiRL.

引用

页数：12

共 64 条

[1] Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition [J].

Banerjee, Avinandan ;

Singh, Pawan Kumar ;

Sarkar, Ram .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) :2206-2216

[2]

Cai RC, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2060

[3] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[4]

Chen L., 2024, IEEE Trans. Multimed., V27, P1114

[5] Pyramid Spatial-Temporal Graph Transformer for Skeleton-Based Action Recognition [J].

Chen, Shuo ;

Xu, Ke ;

Jiang, Xinghao ;

Sun, Tanfeng .

APPLIED SCIENCES-BASEL, 2022, 12 (18)

[6] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].

Chen, Tailin ;

Zhou, Desen ;

Wang, Jian ;

Wang, Shidong ;

Guan, Yu ;

He, Xuming ;

Ding, Errui .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342

[7] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].

Chen, Yuxin ;

Zhang, Ziqi ;

Yuan, Chunfeng ;

Li, Bing ;

Deng, Ying ;

Hu, Weiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348

[8]

Do J., 2024, EUR C COMP VIS, P401

[9]

Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714

[10]

Duan H., 2022, P IEEECVF C COMPUTER, P2969, DOI [10.1109/CVPR52688.2022.00298, DOI 10.1109/CVPR52688.2022.00298]

← 1 2 3 4 5 6 7 →