Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引：0

作者：

Ye, Kenan ^{[1
,2
]}

Zhao, Brian Nlong ^{[3
]}

Liang, Shuang ^{[1
,2
]}

Yao, Han ^{[1
,2
]}

Jia, Wenzhen ^{[1
,2
]}

机构：

[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China

[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China

[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2025年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;

D O I：

10.1109/TCSS.2024.3525083

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.

引用

页数：15

共 50 条

[21] Salt3DNet: A Self-Supervised Learning Framework for 3-D Salt Segmentation
Yang, Liuqing
Fomel, Sergey
Wang, Shoudong
Chen, Xiaohong
Saad, Omar M.
Chen, Yangkang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[22] Self-supervised temporal autoencoder for egocentric action segmentation
Zhang, Mingming
Liu, Dong
Hu, Shizhe
Yan, Xiaoqiang
Sun, Zhongchuan
Ye, Yangdong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[23] Self-Supervised Sub-Action Parsing Network for Semi-Supervised Action Quality Assessment
Gedamu, Kumie
Ji, Yanli
Yang, Yang
Shao, Jie
Shen, Heng Tao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6057 - 6070
[24] Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition
Yang, Huaigang
Zhang, Qieshi
Ren, Ziliang
Yuan, Huaqiang
Zhang, Fuyong
HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2024, 14
[25] Improving self-supervised action recognition from extremely augmented skeleton sequences
Guo, Tianyu
Liu, Mengyuan
Liu, Hong
Wang, Guoquan
Li, Wenhao
PATTERN RECOGNITION, 2024, 150
[26] Language-Skeleton Pre-training to Collaborate with Self-Supervised Human Action Recognition
Liu, Yi
Liu, Ruyi
Xin, Wentian
Miao, Qiguang
Hu, Yuzhi
Qi, Jiahao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 409 - 423
[27] Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition
Planamente, Mirco
Bottino, Andrea
Caputo, Barbara
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8751 - 8758
[28] Self-Supervised Learning via Multi-Transformation Classification for Action Recognition
Duc-Quang Vu
Ngan Le
Wang, Jia-Ching
2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
[29] Self-supervised Learning for Unintentional Action Prediction
Zatsarynna, Olga
Abu Farha, Yazan
Gall, Juergen
PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 429 - 444
[30] Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition
Liu, Mengyuan
Liu, Hong
Guo, Tianyu
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2024, 54 (06) : 743 - 752

← 1 2 3 4 5 →