Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition

被引：0

作者：

Yang, Huaigang ^{[1
]}

Zhang, Qieshi ^{[2
,3
]}

Ren, Ziliang ^{[1
,2
]}

Yuan, Huaqiang ^{[1
]}

Zhang, Fuyong ^{[1
]}

机构：

[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, CAS Key Lab Human Machine Intelligence Synergy Sys, Shenzhen, Peoples R China

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES | 2024年 / 14卷

基金：

中国国家自然科学基金;

关键词：

Skeleton-based Action Recognition; Contrastive Learning; Self-attention; Knowledge Distillation; Skeleton; Segmentation; NETWORKS; LSTM;

D O I：

10.22967/HCIS.2024.14.070

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Since self-supervised learning does not require a large amount of labelled data, some methods have employed self-supervised contrastive learning for 3D skeleton-based action recognition. As the skeleton sequence is a highly correlated data modality, current work only considers the global skeleton sequence after forming different views by data augmentation and input to the contrastive encoding network. Moreover, it does not focus on the local semantic information of the skeleton that leads to certain fine-grained and ambiguous classes of actions on which existing methods may be more difficult to distinguish. Therefore, we propose a self- supervised contrastive learning method with bidirectional knowledge distillation across part streams for skeleton-based action recognition. On the one hand, unlike traditional methods, a pose based factorization of skeleton sequences is performed to form two partial streams, and employ a single partial stream contrastive learning method to encode action features for each of these two streams. On the other hand, we design a contrastive learning framework based on relational knowledge distillation, named cross-part bidirectional distillation (CPBD), to train the upstream self-supervised model in a more reasonable way, and to improve the downstream action recognition accuracy. The proposed recognition framework is evaluated on three datasets: NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD, which achieves the state-of-the-art result performance, and we obtained 92.0% accuracy in PKU-MMD Part I with the linear evaluation protocol. Furthermore, the recognition architecture could distinguish more challenging ambiguous action samples, such as touch head, touch neck, etc.

引用

页数：21

共 50 条

[41] Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
Lin, Lilang
Wu, Lehong
Zhang, Jiahang
Wang, Jiaying
COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 75 - 92
[42] Cross-Scale Spatiotemporal Refinement Learning for Skeleton-Based Action Recognition
Zhang, Yu
Sun, Zhonghua
Dai, Meng
Feng, Jinchao
Jia, Kebin
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 441 - 445
[43] Skeleton-based Action Recognition via Adaptive Cross-Form Learning
Wang, Xuanhan
Dai, Yan
Gao, Lianli
Song, Jingkuan
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1670 - 1678
[44] JointContrast: Skeleton-Based Interaction Recognition with New Representation and Contrastive Learning
Zhang, Ji
Jia, Xiangze
Wang, Zhen
Luo, Yonglong
Chen, Fulong
Yang, Gaoming
Zhao, Lihui
ALGORITHMS, 2023, 16 (04)
[45] Self-supervised group meiosis contrastive learning for EEG-based emotion recognition
Haoning Kan
Jiale Yu
Jiajin Huang
Zihe Liu
Heqian Wang
Haiyan Zhou
Applied Intelligence, 2023, 53 : 27207 - 27225
[46] Contrastive Self-Supervised Learning for Sensor-Based Human Activity Recognition: A Review
Chen, Hui
Gouin-Vallerand, Charles
Bouchard, Kevin
Gaboury, Sebastien
Couture, Melanie
Bier, Nathalie
Giroux, Sylvain
IEEE ACCESS, 2024, 12 : 152511 - 152531
[47] Self-supervised group meiosis contrastive learning for EEG-based emotion recognition
Kan, Haoning
Yu, Jiale
Huang, Jiajin
Liu, Zihe
Wang, Heqian
Zhou, Haiyan
APPLIED INTELLIGENCE, 2023, 53 (22) : 27207 - 27225
[48] Global-local contrastive multiview representation learning for skeleton-based action
Bian, Cunling
Feng, Wei
Meng, Fanbo
Wang, Song
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
[49] A Short Survey on Deep Learning for Skeleton-based Action Recognition
Wang, Wei
Zhang, Yu-Dong
COMPANION PROCEEDINGS OF THE 14TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC'21 COMPANION), 2021,
[50] Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition
Du, Yong
Fu, Yun
Wang, Liang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (07) : 3010 - 3022

← 1 2 3 4 5 →