X-Invariant Contrastive Augmentation and Representation Learning for Semi-Supervised Skeleton-Based Action Recognition

被引：80

作者：

Xu, Binqian ^{[1
]}

Shu, Xiangbo ^{[1
]}

Song, Yan ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2022年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Skeleton; Representation learning; Joints; Bones; Semisupervised learning; Recurrent neural networks; Hidden Markov models; Action recognition; skeleton; semi-supervised; contrastive learning;

D O I：

10.1109/TIP.2022.3175605

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semi-supervised skeleton-based action recognition is a challenging problem due to insufficient labeled data. For addressing this problem, some representative methods leverage contrastive learning to obtain more features from the pre-augmented skeleton actions. Such methods usually adopt a two-stage way: first randomly augment samples, and then learn their representations via contrastive learning. Since skeleton samples have already been randomly augmented, the representation ability of the subsequent contrastive learning is limited due to the inconsistency between the augmentations and representations. Thus, we propose a novel X-invariant Contrastive Augmentation and Representation learning (X-CAR) framework to thoroughly obtain rotate-shear-scale (X for short) invariant features by learning augmentations and representations of skeleton sequences in a one-stage way. In X-CAR, a new Adaptive-combination Augmentation (AA) mechanism is designed to rotate, shear, and scale the skeletons by learnable controlling factors in an adaptive way rather than a random way. Here, such controlling factors are also learned in the whole contrastive learning process, which can facilitate the consistency between the learned augmentations and representations of skeleton sequences. In addition, we relax the pre-definition of positive and negative samples to avoid the confusing allocation of ambiguous samples, and present a new Pull-Push Contrastive Loss (PPCL) to pull the augmenting skeleton close to the original skeleton, while push far away from the other skeletons. Experimental results on both NTU RGB+D and North-Western UCLA datasets show that the proposed X-CAR achieves better accuracy compared with other competitive methods in the semi-supervised scenario.

引用

页码：3852 / 3867

页数：16

共 50 条

[21] Semi-Supervised Group Emotion Recognition Based on Contrastive Learning [J].

Zhang, Jiayi ;

Wang, Xingzhi ;

Zhang, Dong ;

Lee, Dah-Jye .

ELECTRONICS, 2022, 11 (23)

[22] Cross-stream contrastive learning for self-supervised skeleton-based action recognition [J].

Li, Ding ;

Tang, Yongqiang ;

Zhang, Zhizhong ;

Zhang, Wensheng .

IMAGE AND VISION COMPUTING, 2023, 135

[23] Spatial Temporal Enhanced Contrastive and Pretext Learning for Skeleton-based Action Representation [J].

Zhan, Yiwen ;

Chen, Yuchen ;

Ren, Pengfei ;

Sun, Haifeng ;

Wang, Jingyu ;

Qi, Qi ;

Liao, Jianxin .

ASIAN CONFERENCE ON MACHINE LEARNING, VOL 157, 2021, 157 :534-547

[24] Causality-inspired representation learning for weakly supervised skeleton-based action recognition [J].

Wang, Kun ;

Cao, Jiuxin ;

Ge, Jiawei ;

Liu, Chang ;

Liu, Bo .

KNOWLEDGE-BASED SYSTEMS, 2025, 326

[25] Multi-Augmentation-Based Contrastive Learning for Semi-Supervised Learning [J].

Wang, Jie ;

Yang, Jie ;

He, Jiafan ;

Peng, Dongliang .

ALGORITHMS, 2024, 17 (03)

[26] JointContrast: Skeleton-Based Interaction Recognition with New Representation and Contrastive Learning [J].

Zhang, Ji ;

Jia, Xiangze ;

Wang, Zhen ;

Luo, Yonglong ;

Chen, Fulong ;

Yang, Gaoming ;

Zhao, Lihui .

ALGORITHMS, 2023, 16 (04)

[27] A Short Survey on Deep Learning for Skeleton-based Action Recognition [J].

Wang, Wei ;

Zhang, Yu-Dong .

COMPANION PROCEEDINGS OF THE 14TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC'21 COMPANION), 2021,

[28] Semi-Supervised Contrastive Learning for Human Activity Recognition [J].

Liu, Dongxin ;

Abdelzaher, Tarek .

17TH ANNUAL INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS 2021), 2021, :45-53

[29] Self-Supervised Representation Learning for Skeleton-Based Group Activity Recognition [J].

Bian, Cunling ;

Feng, Wei ;

Wang, Song .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :5990-5998

[30] Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition [J].

Wu, Yushan ;

Xu, Zengmin ;

Yuan, Mengwei ;

Tang, Tianchi ;

Meng, Ruxing ;

Wang, Zhongyuan .

MULTIMEDIA SYSTEMS, 2024, 30 (05)

← 1 2 3 4 5 →