Long-term Spatio-temporal Contrastive Learning framework for Skeleton Action Recognition

被引:0
作者
Rustogi, Anshul [1 ]
Mukherjee, Snehasis [1 ]
机构
[1] Shiv Nadar Univ, Delhi Ncr, India
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
关键词
Action Recognition; Self-supervised learning; Contrastive Learning; Skeleton; Action Prediction;
D O I
10.1109/IJCNN55064.2022.9892535
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have been witnessing significant developments in research in human action recognition based on skeleton data. The graphical representation of the human skeleton, available with the dataset, provides opportunity to apply Graph Convolutional Networks (GCN), to avail efficient analysis of deep spatial-temporal information from the joint and skeleton structure. Most of the current works in skeleton action recognition use the temporal aspect of the video in shortterm sequences, ignoring the long-term information present in the evolving skeleton sequence. The proposed long-term Spatiotemporal Contrastive Learning framework for Skeleton Action Recognition uses an encoder-decoder module. The encoder collects deep global-level (long-term) information from the complete action sequence using efficient self-supervision. The proposed encoder combines knowledge from the temporal domain with highlevel information of the relative joint and structure movements of the skeleton. The decoder serves two purposes: predicting the human activity and predicting skeleton structure in the future frames. The decoder primarily uses the high-level encodings from the encoder to anticipate the action. For predicting skeleton structure, we extract an even deeper correlation in the Spatio-temporal domain and merge it with the original frame of the video. We apply a contrastive framework in the frame prediction part so that similar actions have similar predicted skeleton structure. The use of the contrastive framework throughout the proposed model helps achieve exemplary performance while employing a selfsupervised aspect to the model. We test our model on the NTURGB-D-60 dataset and achieve state-of-the-art performance. The codes related to this work are available at: https://github.com/ AnshulRustogi/Long- Term- Spatio- Temporal-Framework.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] SKELETON-BASED ACTION RECOGNITION WITH SYNCHRONOUS LOCAL AND NON-LOCAL SPATIO-TEMPORAL LEARNING AND FREQUENCY ATTENTION
    Hu, Guyue
    Cui, Bo
    Yu, Shan
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1216 - 1221
  • [32] Long-Term Temporal Convolutions for Action Recognition
    Varol, Gul
    Laptev, Ivan
    Schmid, Cordelia
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1510 - 1517
  • [33] JointContrast: Skeleton-Based Mutual Action Recognition with Contrastive Learning
    Jia, Xiangze
    Zhang, Ji
    Wang, Zhen
    Luo, Yonglong
    Chen, Fulong
    Xiao, Jing
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2022, 13631 : 478 - 489
  • [34] Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates
    Liu, Jun
    Shahroudy, Amir
    Xu, Dong
    Kot, Alex C.
    Wang, Gang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) : 3007 - 3021
  • [35] RE-STNet: relational enhancement spatio-temporal networks based on skeleton action recognition
    Chen, Hongwei
    He, Shiqi
    Chen, Zexi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 84 (8) : 4049 - 4069
  • [36] SDE-Net: Skeleton Action Recognition Based on Spatio-Temporal Dependence Enhanced Networks
    Sun, Qing
    Liang, Jiuzhen
    Zhou Xinwen
    Liu, Hao
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 380 - 392
  • [37] Spatio-temporal fusion and contrastive learning for urban flow prediction
    Zhang, Xu
    Gong, Yongshun
    Zhang, Chengqi
    Wu, Xiaoming
    Guo, Ying
    Lu, Wenpeng
    Zhao, Long
    Dong, Xiangjun
    KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [38] Semi-CNN Architecture for Effective Spatio-Temporal Learning in Action Recognition
    Leong, Mei Chee
    Prasad, Dilip K.
    Lee, Yong Tsui
    Lin, Feng
    APPLIED SCIENCES-BASEL, 2020, 10 (02):
  • [39] Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
    Li, Tianjiao
    Foo, Lin Geng
    Ke, Qiuhong
    Rahmani, Hossein
    Wang, Anran
    Wang, Jinghua
    Liu, Jun
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 386 - 403
  • [40] A Spatio-Temporal Deep Learning Approach For Human Action Recognition in Infrared Videos
    Shah, Anuj K.
    Ghosh, Ripul
    Akula, Aparna
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XII, 2018, 10751