Long-term Spatio-temporal Contrastive Learning framework for Skeleton Action Recognition

被引:0
作者
Rustogi, Anshul [1 ]
Mukherjee, Snehasis [1 ]
机构
[1] Shiv Nadar Univ, Delhi Ncr, India
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
关键词
Action Recognition; Self-supervised learning; Contrastive Learning; Skeleton; Action Prediction;
D O I
10.1109/IJCNN55064.2022.9892535
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have been witnessing significant developments in research in human action recognition based on skeleton data. The graphical representation of the human skeleton, available with the dataset, provides opportunity to apply Graph Convolutional Networks (GCN), to avail efficient analysis of deep spatial-temporal information from the joint and skeleton structure. Most of the current works in skeleton action recognition use the temporal aspect of the video in shortterm sequences, ignoring the long-term information present in the evolving skeleton sequence. The proposed long-term Spatiotemporal Contrastive Learning framework for Skeleton Action Recognition uses an encoder-decoder module. The encoder collects deep global-level (long-term) information from the complete action sequence using efficient self-supervision. The proposed encoder combines knowledge from the temporal domain with highlevel information of the relative joint and structure movements of the skeleton. The decoder serves two purposes: predicting the human activity and predicting skeleton structure in the future frames. The decoder primarily uses the high-level encodings from the encoder to anticipate the action. For predicting skeleton structure, we extract an even deeper correlation in the Spatio-temporal domain and merge it with the original frame of the video. We apply a contrastive framework in the frame prediction part so that similar actions have similar predicted skeleton structure. The use of the contrastive framework throughout the proposed model helps achieve exemplary performance while employing a selfsupervised aspect to the model. We test our model on the NTURGB-D-60 dataset and achieve state-of-the-art performance. The codes related to this work are available at: https://github.com/ AnshulRustogi/Long- Term- Spatio- Temporal-Framework.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] SPATIO-TEMPORAL MULTI-SCALE SOFT QUANTIZATION LEARNING FOR SKELETON-BASED HUMAN ACTION RECOGNITION
    Yang, Jianyu
    Zhu, Chen
    Yuan, Junsong
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1078 - 1083
  • [22] Spatio-Temporal Weighted Posture Motion Features for Human Skeleton Action Recognition Research
    Ding C.-Y.
    Liu K.
    Li G.
    Yan L.
    Chen B.-Y.
    Zhong Y.-M.
    Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (01): : 29 - 40
  • [23] Skeleton Action Recognition Based on Spatio-temporal Feature Enhanced Graph Convolutional Network
    Cao, Yi
    Wu, Weiguan
    Li, Ping
    Xia, Yu
    Gao, Qingyuan
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (08) : 3022 - 3031
  • [24] A Contrastive Learning Framework for Vehicle Spatio-Temporal Trajectory Similarity in Intelligent Transportation Systems
    Tong, Qiang
    Xie, Zhi-Chao
    Ni, Wei
    Li, Ning
    Hou, Shoulu
    INFORMATION, 2025, 16 (03)
  • [25] Learning spatio-temporal features for action recognition from the side of the video
    Lishen Pei
    Mao Ye
    Xuezhuan Zhao
    Tao Xiang
    Tao Li
    Signal, Image and Video Processing, 2016, 10 : 199 - 206
  • [26] Learning to Represent Spatio-Temporal Features for Fine Grained Action Recognition
    Sakhalkar, Kaustubh
    Bremond, Francois
    2018 IEEE THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, APPLICATIONS AND SYSTEMS (IPAS), 2018, : 268 - 272
  • [27] Learning spatio-temporal features for action recognition from the side of the video
    Pei, Lishen
    Ye, Mao
    Zhao, Xuezhuan
    Xiang, Tao
    Li, Tao
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 199 - 206
  • [28] Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach
    Liu, Li
    Shao, Ling
    Li, Xuelong
    Lu, Ke
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (01) : 158 - 170
  • [29] Action recognition by spatio-temporal oriented energies
    Zhen, Xiantong
    Shao, Ling
    Li, Xuelong
    INFORMATION SCIENCES, 2014, 281 : 295 - 309
  • [30] Spatio-Temporal Fusion Networks for Action Recognition
    Cho, Sangwoo
    Foroosh, Hassan
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 347 - 364