Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition

被引:0
|
作者
Yang, Huaigang [1 ]
Zhang, Qieshi [2 ,3 ]
Ren, Ziliang [1 ,2 ]
Yuan, Huaqiang [1 ]
Zhang, Fuyong [1 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, CAS Key Lab Human Machine Intelligence Synergy Sys, Shenzhen, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Skeleton-based Action Recognition; Contrastive Learning; Self-attention; Knowledge Distillation; Skeleton; Segmentation; NETWORKS; LSTM;
D O I
10.22967/HCIS.2024.14.070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since self-supervised learning does not require a large amount of labelled data, some methods have employed self-supervised contrastive learning for 3D skeleton-based action recognition. As the skeleton sequence is a highly correlated data modality, current work only considers the global skeleton sequence after forming different views by data augmentation and input to the contrastive encoding network. Moreover, it does not focus on the local semantic information of the skeleton that leads to certain fine-grained and ambiguous classes of actions on which existing methods may be more difficult to distinguish. Therefore, we propose a self- supervised contrastive learning method with bidirectional knowledge distillation across part streams for skeleton-based action recognition. On the one hand, unlike traditional methods, a pose based factorization of skeleton sequences is performed to form two partial streams, and employ a single partial stream contrastive learning method to encode action features for each of these two streams. On the other hand, we design a contrastive learning framework based on relational knowledge distillation, named cross-part bidirectional distillation (CPBD), to train the upstream self-supervised model in a more reasonable way, and to improve the downstream action recognition accuracy. The proposed recognition framework is evaluated on three datasets: NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD, which achieves the state-of-the-art result performance, and we obtained 92.0% accuracy in PKU-MMD Part I with the linear evaluation protocol. Furthermore, the recognition architecture could distinguish more challenging ambiguous action samples, such as touch head, touch neck, etc.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Cross-stream contrastive learning for self-supervised skeleton-based action recognition
    Li, Ding
    Tang, Yongqiang
    Zhang, Zhizhong
    Zhang, Wensheng
    IMAGE AND VISION COMPUTING, 2023, 135
  • [2] Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
    Hu, Jinhua
    Hou, Yonghong
    Guo, Zihui
    Gao, Jiajun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10578 - 10589
  • [3] Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition
    Wang, Peng
    Wen, Jun
    Si, Chenyang
    Qian, Yuntao
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6224 - 6238
  • [4] Focalized contrastive view-invariant learning for self-supervised skeleton-based action recognition
    Men, Qianhui
    Ho, Edmond S. L.
    Shum, Hubert P. H.
    Leung, Howard
    NEUROCOMPUTING, 2023, 537 : 198 - 209
  • [5] Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition
    Wu, Yushan
    Xu, Zengmin
    Yuan, Mengwei
    Tang, Tianchi
    Meng, Ruxing
    Wang, Zhongyuan
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [6] Bayesian Contrastive Learning with Manifold Regularization for Self-Supervised Skeleton Based Action Recognition
    Lin, Lilang
    Zhang, Jiahang
    Liu, Jiaying
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [7] SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition
    Liu, Ruyi
    Liu, Yi
    Wu, Mengyao
    Xin, Wentian
    Miao, Qiguang
    Liu, Xiangzeng
    Lie, Long
    PATTERN RECOGNITION, 2025, 162
  • [8] A puzzle questions form training for self-supervised skeleton-based action recognition
    Moutik, Oumaima
    Sekkat, Hiba
    Tchakoucht, Taha Ait
    El Kari, Badr
    Alaoui, Ahmed El Hilali
    IMAGE AND VISION COMPUTING, 2024, 148
  • [9] Temporal-masked skeleton-based action recognition with supervised contrastive learning
    Zhao, Zhifeng
    Chen, Guodong
    Lin, Yuxiang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2267 - 2275
  • [10] Temporal-masked skeleton-based action recognition with supervised contrastive learning
    Zhifeng Zhao
    Guodong Chen
    Yuxiang Lin
    Signal, Image and Video Processing, 2023, 17 : 2267 - 2275