Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引：0

作者：

Ye, Kenan ^{[1
,2
]}

Zhao, Brian Nlong ^{[3
]}

Liang, Shuang ^{[1
,2
]}

Yao, Han ^{[1
,2
]}

Jia, Wenzhen ^{[1
,2
]}

机构：

[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China

[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China

[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2025年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;

D O I：

10.1109/TCSS.2024.3525083

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.

引用

页数：15

共 50 条

[1] View Enhanced Jigsaw Puzzle for Self-Supervised Feature Learning in 3D Human Action Recognition
You, Wei
Wang, Xue
IEEE ACCESS, 2022, 10 : 36385 - 36396
[2] Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition
Yang, Yang
Liu, Guangjun
Gao, Xuehao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8623 - 8634
[3] Localized Linear Temporal Dynamics for Self-Supervised Skeleton Action Recognition
Wang, Xinghan
Mu, Yadong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10189 - 10199
[4] Enhanced Industrial Action Recognition Through Self-Supervised Visual Transformers
Xiao, Yao
Xiang, Hua
Wang, Tongxi
Wang, Yiju
IEEE ACCESS, 2024, 12 : 134133 - 134143
[5] SSRL: Self-Supervised Spatial-Temporal Representation Learning for 3D Action Recognition
Jin, Zhihao
Wang, Yifan
Wang, Qicong
Shen, Yehu
Meng, Hongying
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 274 - 285
[6] Self-Supervised Video-Based Action Recognition With Disturbances
Lin, Wei
Ding, Xinghao
Huang, Yue
Zeng, Huanqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2493 - 2507
[7] Collaboratively Self-Supervised Video Representation Learning for Action Recognition
Zhang, Jie
Wan, Zhifan
Hu, Lanqing
Lin, Stephen
Wu, Shuzhe
Shan, Shiguang
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1895 - 1907
[8] Attention-guided mask learning for self-supervised 3D action recognition
Zhang, Haoyuan
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (06) : 7487 - 7496
[9] Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
Hu, Jinhua
Hou, Yonghong
Guo, Zihui
Gao, Jiajun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10578 - 10589
[10] Bayesian Contrastive Learning with Manifold Regularization for Self-Supervised Skeleton Based Action Recognition
Lin, Lilang
Zhang, Jiahang
Liu, Jiaying
2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,

← 1 2 3 4 5 →