Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引:0
|
作者
Ye, Kenan [1 ,2 ]
Zhao, Brian Nlong [3 ]
Liang, Shuang [1 ,2 ]
Yao, Han [1 ,2 ]
Jia, Wenzhen [1 ,2 ]
机构
[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China
[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2025年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;
D O I
10.1109/TCSS.2024.3525083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] ConMLP: MLP-Based Self-Supervised Contrastive Learning for Skeleton Data Analysis and Action Recognition
    Dai, Chuan
    Wei, Yajuan
    Xu, Zhijie
    Chen, Minsi
    Liu, Ying
    Fan, Jiulun
    SENSORS, 2023, 23 (05)
  • [42] Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition
    Shuai Bi
    Zhengping Hu
    Mengyao Zhao
    Hehao Zhang
    Jirui Di
    Zhe Sun
    Signal, Image and Video Processing, 2023, 17 : 3775 - 3782
  • [43] Semi-Supervised Action Quality Assessment With Self-Supervised Segment Feature Recovery
    Zhang, Shao-Jie
    Pan, Jia-Hui
    Gao, Jibin
    Zheng, Wei-Shi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6017 - 6028
  • [44] Facial Action Unit Representation Based on Self-Supervised Learning With Ensembled Priori Constraints
    Chen, Haifeng
    Zhang, Peng
    Guo, Chujia
    Lu, Ke
    Jiang, Dongmei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5045 - 5059
  • [45] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
    Yang, Hongxin
    Huang, Shangfeng
    Wang, Ruisheng
    Wang, Xin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [46] Self-Supervised 3-D Semantic Representation Learning for Vision-and-Language Navigation
    Tan, Sinan
    Sima, Kuankuan
    Wang, Dunzheng
    Ge, Mengmeng
    Guo, Di
    Liu, Huaping
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) : 6738 - 6751
  • [47] SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition
    Liu, Ruyi
    Liu, Yi
    Wu, Mengyao
    Xin, Wentian
    Miao, Qiguang
    Liu, Xiangzeng
    Lie, Long
    PATTERN RECOGNITION, 2025, 162
  • [48] Self-Supervised Monocular Depth Estimation With 3-D Displacement Module for Laparoscopic Images
    Xu, Chi
    Huang, Baoru
    Elson, Daniel S.
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2022, 4 (02): : 331 - 334
  • [49] Self-Supervised Learning for 3-D Point Clouds Based on a Masked Linear Autoencoder
    Yang, Hongxin
    Wang, Ruisheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
  • [50] Opt-SSL: An Enhanced Self-Supervised Framework for Food Recognition
    Ballus, Nil
    Nagarajan, Bhalaji
    Radeva, Petia
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022), 2022, 13256 : 655 - 666