Long-term Spatio-temporal Contrastive Learning framework for Skeleton Action Recognition

被引:0
作者
Rustogi, Anshul [1 ]
Mukherjee, Snehasis [1 ]
机构
[1] Shiv Nadar Univ, Delhi Ncr, India
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
关键词
Action Recognition; Self-supervised learning; Contrastive Learning; Skeleton; Action Prediction;
D O I
10.1109/IJCNN55064.2022.9892535
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have been witnessing significant developments in research in human action recognition based on skeleton data. The graphical representation of the human skeleton, available with the dataset, provides opportunity to apply Graph Convolutional Networks (GCN), to avail efficient analysis of deep spatial-temporal information from the joint and skeleton structure. Most of the current works in skeleton action recognition use the temporal aspect of the video in shortterm sequences, ignoring the long-term information present in the evolving skeleton sequence. The proposed long-term Spatiotemporal Contrastive Learning framework for Skeleton Action Recognition uses an encoder-decoder module. The encoder collects deep global-level (long-term) information from the complete action sequence using efficient self-supervision. The proposed encoder combines knowledge from the temporal domain with highlevel information of the relative joint and structure movements of the skeleton. The decoder serves two purposes: predicting the human activity and predicting skeleton structure in the future frames. The decoder primarily uses the high-level encodings from the encoder to anticipate the action. For predicting skeleton structure, we extract an even deeper correlation in the Spatio-temporal domain and merge it with the original frame of the video. We apply a contrastive framework in the frame prediction part so that similar actions have similar predicted skeleton structure. The use of the contrastive framework throughout the proposed model helps achieve exemplary performance while employing a selfsupervised aspect to the model. We test our model on the NTURGB-D-60 dataset and achieve state-of-the-art performance. The codes related to this work are available at: https://github.com/ AnshulRustogi/Long- Term- Spatio- Temporal-Framework.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Spatiotemporal Decouple-and-Squeeze Contrastive Learning for Semisupervised Skeleton-Based Action Recognition
    Xu, Binqian
    Shu, Xiangbo
    Zhang, Jiachao
    Dai, Guangzhao
    Song, Yan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 11035 - 11048
  • [42] Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
    Hu, Jinhua
    Hou, Yonghong
    Guo, Zihui
    Gao, Jiajun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10578 - 10589
  • [43] A new framework of action recognition with discriminative parts, spatio-temporal and causal interaction descriptors
    Tong, Ming
    Chen, Yiran
    Zhao, Mengao
    Tian, Weijuan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 56 : 116 - 130
  • [44] An End to End Framework With Adaptive Spatio-Temporal Attention Module for Human Action Recognition
    Liu, Shaocan
    Ma, Xin
    Wu, Hanbo
    Li, Yibin
    IEEE ACCESS, 2020, 8 : 47220 - 47231
  • [45] Spatio-Temporal Attention Networks for Action Recognition and Detection
    Li, Jun
    Liu, Xianglong
    Zhang, Wenxuan
    Zhang, Mingyuan
    Song, Jingkuan
    Sebe, Nicu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
  • [46] Exploiting spatio-temporal knowledge for video action recognition
    Zhang, Huigang
    Wang, Liuan
    Sun, Jun
    IET COMPUTER VISION, 2023, 17 (02) : 222 - 230
  • [47] Spatio-temporal Semantic Features for Human Action Recognition
    Liu, Jia
    Wang, Xiaonian
    Li, Tianyu
    Yang, Jie
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2012, 6 (10): : 2632 - 2649
  • [48] ACTION RECOGNITION USING SPATIO-TEMPORAL DIFFERENTIAL MOTION
    Yadav, Gaurav Kumar
    Sethi, Amit
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3415 - 3419
  • [49] Spatio-Temporal Laplacian Pyramid Coding for Action Recognition
    Shao, Ling
    Zhen, Xiantong
    Tao, Dacheng
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (06) : 817 - 827
  • [50] Spatio-temporal Video Autoencoder for Human Action Recognition
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123