Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引:0
|
作者
Ye, Kenan [1 ,2 ]
Zhao, Brian Nlong [3 ]
Liang, Shuang [1 ,2 ]
Yao, Han [1 ,2 ]
Jia, Wenzhen [1 ,2 ]
机构
[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China
[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2025年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;
D O I
10.1109/TCSS.2024.3525083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Salt3DNet: A Self-Supervised Learning Framework for 3-D Salt Segmentation
    Yang, Liuqing
    Fomel, Sergey
    Wang, Shoudong
    Chen, Xiaohong
    Saad, Omar M.
    Chen, Yangkang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [22] Self-supervised temporal autoencoder for egocentric action segmentation
    Zhang, Mingming
    Liu, Dong
    Hu, Shizhe
    Yan, Xiaoqiang
    Sun, Zhongchuan
    Ye, Yangdong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [23] Self-Supervised Sub-Action Parsing Network for Semi-Supervised Action Quality Assessment
    Gedamu, Kumie
    Ji, Yanli
    Yang, Yang
    Shao, Jie
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6057 - 6070
  • [24] Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition
    Yang, Huaigang
    Zhang, Qieshi
    Ren, Ziliang
    Yuan, Huaqiang
    Zhang, Fuyong
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2024, 14
  • [25] Improving self-supervised action recognition from extremely augmented skeleton sequences
    Guo, Tianyu
    Liu, Mengyuan
    Liu, Hong
    Wang, Guoquan
    Li, Wenhao
    PATTERN RECOGNITION, 2024, 150
  • [26] Language-Skeleton Pre-training to Collaborate with Self-Supervised Human Action Recognition
    Liu, Yi
    Liu, Ruyi
    Xin, Wentian
    Miao, Qiguang
    Hu, Yuzhi
    Qi, Jiahao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 409 - 423
  • [27] Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition
    Planamente, Mirco
    Bottino, Andrea
    Caputo, Barbara
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8751 - 8758
  • [28] Self-Supervised Learning via Multi-Transformation Classification for Action Recognition
    Duc-Quang Vu
    Ngan Le
    Wang, Jia-Ching
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [29] Self-supervised Learning for Unintentional Action Prediction
    Zatsarynna, Olga
    Abu Farha, Yazan
    Gall, Juergen
    PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 429 - 444
  • [30] Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition
    Liu, Mengyuan
    Liu, Hong
    Guo, Tianyu
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2024, 54 (06) : 743 - 752