Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引：0

作者：

Ye, Kenan ^{[1
,2
]}

Zhao, Brian Nlong ^{[3
]}

Liang, Shuang ^{[1
,2
]}

Yao, Han ^{[1
,2
]}

Jia, Wenzhen ^{[1
,2
]}

机构：

[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China

[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China

[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2025年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;

D O I：

10.1109/TCSS.2024.3525083

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.

引用

页数：15

共 50 条

[41] ConMLP: MLP-Based Self-Supervised Contrastive Learning for Skeleton Data Analysis and Action Recognition
Dai, Chuan
Wei, Yajuan
Xu, Zhijie
Chen, Minsi
Liu, Ying
Fan, Jiulun
SENSORS, 2023, 23 (05)
[42] Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition
Shuai Bi
Zhengping Hu
Mengyao Zhao
Hehao Zhang
Jirui Di
Zhe Sun
Signal, Image and Video Processing, 2023, 17 : 3775 - 3782
[43] Semi-Supervised Action Quality Assessment With Self-Supervised Segment Feature Recovery
Zhang, Shao-Jie
Pan, Jia-Hui
Gao, Jibin
Zheng, Wei-Shi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6017 - 6028
[44] Facial Action Unit Representation Based on Self-Supervised Learning With Ensembled Priori Constraints
Chen, Haifeng
Zhang, Peng
Guo, Chujia
Lu, Ke
Jiang, Dongmei
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5045 - 5059
[45] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
Yang, Hongxin
Huang, Shangfeng
Wang, Ruisheng
Wang, Xin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[46] Self-Supervised 3-D Semantic Representation Learning for Vision-and-Language Navigation
Tan, Sinan
Sima, Kuankuan
Wang, Dunzheng
Ge, Mengmeng
Guo, Di
Liu, Huaping
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) : 6738 - 6751
[47] SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition
Liu, Ruyi
Liu, Yi
Wu, Mengyao
Xin, Wentian
Miao, Qiguang
Liu, Xiangzeng
Lie, Long
PATTERN RECOGNITION, 2025, 162
[48] Self-Supervised Monocular Depth Estimation With 3-D Displacement Module for Laparoscopic Images
Xu, Chi
Huang, Baoru
Elson, Daniel S.
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2022, 4 (02): : 331 - 334
[49] Self-Supervised Learning for 3-D Point Clouds Based on a Masked Linear Autoencoder
Yang, Hongxin
Wang, Ruisheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
[50] Opt-SSL: An Enhanced Self-Supervised Framework for Food Recognition
Ballus, Nil
Nagarajan, Bhalaji
Radeva, Petia
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022), 2022, 13256 : 655 - 666

← 1 2 3 4 5 →