Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition

被引：0

作者：

Yang, Huaigang ^{[1
]}

Zhang, Qieshi ^{[2
,3
]}

Ren, Ziliang ^{[1
,2
]}

Yuan, Huaqiang ^{[1
]}

Zhang, Fuyong ^{[1
]}

机构：

[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, CAS Key Lab Human Machine Intelligence Synergy Sys, Shenzhen, Peoples R China

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES | 2024年 / 14卷

基金：

中国国家自然科学基金;

关键词：

Skeleton-based Action Recognition; Contrastive Learning; Self-attention; Knowledge Distillation; Skeleton; Segmentation; NETWORKS; LSTM;

D O I：

10.22967/HCIS.2024.14.070

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Since self-supervised learning does not require a large amount of labelled data, some methods have employed self-supervised contrastive learning for 3D skeleton-based action recognition. As the skeleton sequence is a highly correlated data modality, current work only considers the global skeleton sequence after forming different views by data augmentation and input to the contrastive encoding network. Moreover, it does not focus on the local semantic information of the skeleton that leads to certain fine-grained and ambiguous classes of actions on which existing methods may be more difficult to distinguish. Therefore, we propose a self- supervised contrastive learning method with bidirectional knowledge distillation across part streams for skeleton-based action recognition. On the one hand, unlike traditional methods, a pose based factorization of skeleton sequences is performed to form two partial streams, and employ a single partial stream contrastive learning method to encode action features for each of these two streams. On the other hand, we design a contrastive learning framework based on relational knowledge distillation, named cross-part bidirectional distillation (CPBD), to train the upstream self-supervised model in a more reasonable way, and to improve the downstream action recognition accuracy. The proposed recognition framework is evaluated on three datasets: NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD, which achieves the state-of-the-art result performance, and we obtained 92.0% accuracy in PKU-MMD Part I with the linear evaluation protocol. Furthermore, the recognition architecture could distinguish more challenging ambiguous action samples, such as touch head, touch neck, etc.

引用

页数：21

共 50 条

[1] Cross-stream contrastive learning for self-supervised skeleton-based action recognition
Li, Ding
Tang, Yongqiang
Zhang, Zhizhong
Zhang, Wensheng
IMAGE AND VISION COMPUTING, 2023, 135
[2] Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
Hu, Jinhua
Hou, Yonghong
Guo, Zihui
Gao, Jiajun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10578 - 10589
[3] Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition
Wang, Peng
Wen, Jun
Si, Chenyang
Qian, Yuntao
Wang, Liang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6224 - 6238
[4] Focalized contrastive view-invariant learning for self-supervised skeleton-based action recognition
Men, Qianhui
Ho, Edmond S. L.
Shum, Hubert P. H.
Leung, Howard
NEUROCOMPUTING, 2023, 537 : 198 - 209
[5] Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition
Wu, Yushan
Xu, Zengmin
Yuan, Mengwei
Tang, Tianchi
Meng, Ruxing
Wang, Zhongyuan
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[6] Bayesian Contrastive Learning with Manifold Regularization for Self-Supervised Skeleton Based Action Recognition
Lin, Lilang
Zhang, Jiahang
Liu, Jiaying
2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
[7] SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition
Liu, Ruyi
Liu, Yi
Wu, Mengyao
Xin, Wentian
Miao, Qiguang
Liu, Xiangzeng
Lie, Long
PATTERN RECOGNITION, 2025, 162
[8] A puzzle questions form training for self-supervised skeleton-based action recognition
Moutik, Oumaima
Sekkat, Hiba
Tchakoucht, Taha Ait
El Kari, Badr
Alaoui, Ahmed El Hilali
IMAGE AND VISION COMPUTING, 2024, 148
[9] Temporal-masked skeleton-based action recognition with supervised contrastive learning
Zhao, Zhifeng
Chen, Guodong
Lin, Yuxiang
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2267 - 2275
[10] Temporal-masked skeleton-based action recognition with supervised contrastive learning
Zhifeng Zhao
Guodong Chen
Yuxiang Lin
Signal, Image and Video Processing, 2023, 17 : 2267 - 2275

← 1 2 3 4 5 →