Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition

被引：6

作者：

Hu, Jinhua ^{[1
]}

Hou, Yonghong ^{[1
]}

Guo, Zihui ^{[2
]}

Gao, Jiajun ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Tianjin Chengjian Univ, Sch Comp & Informat Engn, Tianjin 300384, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 11期

关键词：

Self-supervised learning; 3D action recognition; skeleton; contrastive learning;

D O I：

10.1109/TCSVT.2024.3410301

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Contrastive learning for self-supervised skeletonbased action recognition has recently received attention. It has been observed that local crops, containing partial action sequences, can predict action patterns, which is advantageous for skeleton-based action recognition. This paper proposes a Global and Local Contrastive Learning framework (skeletonlogoCLR) with two contrastive learning routes, Global-to-Global and Global-to-Local, which utilize the similarity between global and local crops of the same skeleton sequence. Specifically, in the Global-to-Global route, we design Temporal Attention Crop-Resize (TACR) to learn global semantic information by maximizing the retention of action region in the temporal dimension. In the Global-to-Local route, the proposed Skeleton-logo Augmentation is deviced to concatenate two local crops from different sequences for local semantic learning. Moreover, instead of fusing directly, the losses of two routes are combined in a cascaded manner through the Self-Adaptive Training Strategy (SATS) to achieve stronger generalization performance. Extensive experiments are conducted on the NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. The results demonstrate that the proposed method achieves remarkable performance.

引用

页码：10578 / 10589

页数：12

共 61 条

[1]

Bao H., 2021, arXiv

[2]

Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934

[3]

Chen T, 2020, PR MACH LEARN RES, V119

[4]

Chen XL, 2020, Arxiv, DOI arXiv:2003.04297

[5] Exploring Simple Siamese Representation Learning [J].

Chen, Xinlei ;

He, Kaiming .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753

[6]

Chen Z., 2022, arXiv

[7]

Cheng Y.-B., 2021, P 2 ACM INT C MULT A, P1

[8] InfoGCN: Representation Learning for Human Skeleton-based Action Recognition [J].

Chi, Hyung-gun ;

Ha, Myoung Hoon ;

Chi, Seunggeun ;

Lee, Sang Wan ;

Huang, Qixing ;

Ramani, Karthik .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20154-20164

[9]

Gower RM, 2019, PR MACH LEARN RES, V97

[10]

Grill Jean-Bastien., 2020, Proc. Adv. Neural Inf. Process. Syst, P21271, DOI DOI 10.48550/ARXIV.2006.07733

← 1 2 3 4 5 6 7 →