Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition

被引:0
|
作者
Yang, Huaigang [1 ]
Zhang, Qieshi [2 ,3 ]
Ren, Ziliang [1 ,2 ]
Yuan, Huaqiang [1 ]
Zhang, Fuyong [1 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, CAS Key Lab Human Machine Intelligence Synergy Sys, Shenzhen, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Skeleton-based Action Recognition; Contrastive Learning; Self-attention; Knowledge Distillation; Skeleton; Segmentation; NETWORKS; LSTM;
D O I
10.22967/HCIS.2024.14.070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since self-supervised learning does not require a large amount of labelled data, some methods have employed self-supervised contrastive learning for 3D skeleton-based action recognition. As the skeleton sequence is a highly correlated data modality, current work only considers the global skeleton sequence after forming different views by data augmentation and input to the contrastive encoding network. Moreover, it does not focus on the local semantic information of the skeleton that leads to certain fine-grained and ambiguous classes of actions on which existing methods may be more difficult to distinguish. Therefore, we propose a self- supervised contrastive learning method with bidirectional knowledge distillation across part streams for skeleton-based action recognition. On the one hand, unlike traditional methods, a pose based factorization of skeleton sequences is performed to form two partial streams, and employ a single partial stream contrastive learning method to encode action features for each of these two streams. On the other hand, we design a contrastive learning framework based on relational knowledge distillation, named cross-part bidirectional distillation (CPBD), to train the upstream self-supervised model in a more reasonable way, and to improve the downstream action recognition accuracy. The proposed recognition framework is evaluated on three datasets: NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD, which achieves the state-of-the-art result performance, and we obtained 92.0% accuracy in PKU-MMD Part I with the linear evaluation protocol. Furthermore, the recognition architecture could distinguish more challenging ambiguous action samples, such as touch head, touch neck, etc.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] A Bidirectional Separated Distillation-Based Cross-Modal Interactive Fusion Network for Skeleton-Based Action Recognition
    Wang, Mingdao
    Zhang, Xianlin
    Chen, Siqi
    Li, Xueming
    Zhang, Yue
    IEEE SENSORS JOURNAL, 2025, 25 (01) : 1814 - 1824
  • [22] DIDA: Dynamic Individual-to-integrateD Augmentation for Self-supervised Skeleton-Based Action Recognition
    Hu, Haobo
    Li, Jianan
    Fan, Hongbin
    Zhao, Zhifu
    Zhou, Yangtao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 496 - 510
  • [23] DMMG: Dual Min-Max Games for Self-Supervised Skeleton-Based Action Recognition
    Guan, Shannan
    Yu, Xin
    Huang, Wei
    Fang, Gengfa
    Lu, Haiyan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 395 - 407
  • [24] Multi-Granularity Anchor-Contrastive Representation Learning for Semi-Supervised Skeleton-Based Action Recognition
    Shu, Xiangbo
    Xu, Binqian
    Zhang, Liyan
    Tang, Jinhui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7559 - 7576
  • [25] SELF-SUPERVISED CONTRASTIVE LEARNING FOR AUDIO-VISUAL ACTION RECOGNITION
    Liu, Yang
    Tan, Ying
    Lan, Haoyuan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1000 - 1004
  • [26] Spatiotemporal Decouple-and-Squeeze Contrastive Learning for Semisupervised Skeleton-Based Action Recognition
    Xu, Binqian
    Shu, Xiangbo
    Zhang, Jiachao
    Dai, Guangzhao
    Song, Yan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 11035 - 11048
  • [27] Efficient Spatio-Temporal Contrastive Learning for Skeleton-Based 3-D Action Recognition
    Gao, Xuehao
    Yang, Yang
    Zhang, Yimeng
    Li, Maosen
    Yu, Jin-Gang
    Du, Shaoyi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 405 - 417
  • [28] Structural Knowledge Distillation for Efficient Skeleton-Based Action Recognition
    Bian, Cunling
    Feng, Wei
    Wan, Liang
    Wang, Song
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2963 - 2976
  • [29] Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations
    Zhang, Jiahang
    Lin, Lilang
    Liu, Jiaying
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3427 - 3435
  • [30] Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition
    Liu, Mengyuan
    Liu, Hong
    Guo, Tianyu
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2024, 54 (06) : 743 - 752