Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引:0
|
作者
Ye, Kenan [1 ,2 ]
Zhao, Brian Nlong [3 ]
Liang, Shuang [1 ,2 ]
Yao, Han [1 ,2 ]
Jia, Wenzhen [1 ,2 ]
机构
[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China
[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2025年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;
D O I
10.1109/TCSS.2024.3525083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Joint 3-D Human Reconstruction and Hybrid Pose Self-Supervision for Action Recognition
    Quan, Wei
    Wang, Hexin
    Li, Luwei
    Qiu, Yuxuan
    Shi, Zhiping
    Jiang, Na
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (07): : 8470 - 8483
  • [32] DMMG: Dual Min-Max Games for Self-Supervised Skeleton-Based Action Recognition
    Guan, Shannan
    Yu, Xin
    Huang, Wei
    Fang, Gengfa
    Lu, Haiyan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 395 - 407
  • [33] Efficient Spatio-Temporal Contrastive Learning for Skeleton-Based 3-D Action Recognition
    Gao, Xuehao
    Yang, Yang
    Zhang, Yimeng
    Li, Maosen
    Yu, Jin-Gang
    Du, Shaoyi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 405 - 417
  • [34] CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation
    Mao, Yunyao
    Zhou, Wengang
    Lu, Zhenbo
    Deng, Jiajun
    Li, Houqiang
    COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 734 - 752
  • [35] A Self-Supervised Pretraining Framework for Context-Aware Building Edge Extraction From 3-D Point Clouds
    Yang, Hongxin
    Xu, Shanshan
    Xu, Sheng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [36] D3D: Dual 3-D Convolutional Network for Real-Time Action Recognition
    Jiang, Shengqin
    Qi, Yuankai
    Zhang, Haokui
    Bai, Zongwen
    Lu, Xiaobo
    Wang, Peng
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (07) : 4584 - 4593
  • [37] DIDA: Dynamic Individual-to-integrateD Augmentation for Self-supervised Skeleton-Based Action Recognition
    Hu, Haobo
    Li, Jianan
    Fan, Hongbin
    Zhao, Zhifu
    Zhou, Yangtao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 496 - 510
  • [38] Self-supervised Representation Learning for Fine Grained Human Hand Action Recognition in Industrial Assembly Lines
    Sturm, Fabian
    Sathiyababu, Rahul
    Allipilli, Harshitha
    Hergenroether, Elke
    Siegel, Melanie
    ADVANCES IN VISUAL COMPUTING, ISVC 2023, PT I, 2023, 14361 : 172 - 184
  • [39] Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition
    Bi, Shuai
    Hu, Zhengping
    Zhao, Mengyao
    Zhang, Hehao
    Di, Jirui
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (07) : 3775 - 3782
  • [40] Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition
    Wu, Yushan
    Xu, Zengmin
    Yuan, Mengwei
    Tang, Tianchi
    Meng, Ruxing
    Wang, Zhongyuan
    MULTIMEDIA SYSTEMS, 2024, 30 (05)