Long-term Spatio-temporal Contrastive Learning framework for Skeleton Action Recognition

被引:0
作者
Rustogi, Anshul [1 ]
Mukherjee, Snehasis [1 ]
机构
[1] Shiv Nadar Univ, Delhi Ncr, India
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
关键词
Action Recognition; Self-supervised learning; Contrastive Learning; Skeleton; Action Prediction;
D O I
10.1109/IJCNN55064.2022.9892535
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have been witnessing significant developments in research in human action recognition based on skeleton data. The graphical representation of the human skeleton, available with the dataset, provides opportunity to apply Graph Convolutional Networks (GCN), to avail efficient analysis of deep spatial-temporal information from the joint and skeleton structure. Most of the current works in skeleton action recognition use the temporal aspect of the video in shortterm sequences, ignoring the long-term information present in the evolving skeleton sequence. The proposed long-term Spatiotemporal Contrastive Learning framework for Skeleton Action Recognition uses an encoder-decoder module. The encoder collects deep global-level (long-term) information from the complete action sequence using efficient self-supervision. The proposed encoder combines knowledge from the temporal domain with highlevel information of the relative joint and structure movements of the skeleton. The decoder serves two purposes: predicting the human activity and predicting skeleton structure in the future frames. The decoder primarily uses the high-level encodings from the encoder to anticipate the action. For predicting skeleton structure, we extract an even deeper correlation in the Spatio-temporal domain and merge it with the original frame of the video. We apply a contrastive framework in the frame prediction part so that similar actions have similar predicted skeleton structure. The use of the contrastive framework throughout the proposed model helps achieve exemplary performance while employing a selfsupervised aspect to the model. We test our model on the NTURGB-D-60 dataset and achieve state-of-the-art performance. The codes related to this work are available at: https://github.com/ AnshulRustogi/Long- Term- Spatio- Temporal-Framework.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Learning Representations by Contrastive Spatio-Temporal Clustering for Skeleton-Based Action Recognition
    Wang, Mingdao
    Li, Xueming
    Chen, Siqi
    Zhang, Xianlin
    Ma, Lei
    Zhang, Yue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3207 - 3220
  • [2] Spatio-Temporal Contrastive Learning for Compositional Action Recognition
    Gong, Yezi
    Pei, Mingtao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 424 - 438
  • [3] Efficient Spatio-Temporal Contrastive Learning for Skeleton-Based 3-D Action Recognition
    Gao, Xuehao
    Yang, Yang
    Zhang, Yimeng
    Li, Maosen
    Yu, Jin-Gang
    Du, Shaoyi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 405 - 417
  • [4] Joint Learning in the Spatio-Temporal and Frequency Domains for Skeleton-Based Action Recognition
    Hu, Guyue
    Cui, Bo
    Yu, Shan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2207 - 2220
  • [5] Global spatio-temporal synergistic topology learning for skeleton-based action recognition
    Dai, Meng
    Sun, Zhonghua
    Wang, Tianyi
    Feng, Jinchao
    Jia, Kebin
    PATTERN RECOGNITION, 2023, 140
  • [6] SKELETON ACTION RECOGNITION BASED ON SPATIO-TEMPORAL FEATURES
    Huang, Qian
    Xie, Mengting
    Li, Xing
    Wang, Shuaichen
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3284 - 3288
  • [7] Spatio-Temporal Meta Contrastive Learning
    Tang, Jiabin
    Xia, Lianghao
    Hu, Jie
    Huang, Chao
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 2412 - 2421
  • [8] Spatio-temporal segments attention for skeleton-based action recognition
    Qiu, Helei
    Hou, Biao
    Ren, Bo
    Zhang, Xiaohua
    NEUROCOMPUTING, 2023, 518 : 30 - 38
  • [9] LEARNING SPATIO-TEMPORAL DEPENDENCIES FOR ACTION RECOGNITION
    Cai, Qiao
    Yin, Yafeng
    Man, Hong
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3740 - 3744
  • [10] Dual Contrastive Learning for Spatio-temporal Representation
    Ding, Shuangrui
    Qian, Rui
    Xiong, Hongkai
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5649 - 5658