Self-Supervised Pretraining Based on Noise-Free Motion Reconstruction and Semantic-Aware Contrastive Learning for Human Motion Prediction

被引：4

作者：

Li, Qin ^{[1
]}

Wang, Yong ^{[1
]}

机构：

[1] Cent South Univ, Sch Automation, Changsha 410083, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年 / 8卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Human motion prediction; self-supervised pretraining; anti-interference; semantic attribute learning; MODELS;

D O I：

10.1109/TETCI.2023.3257262

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human motion prediction is to forecast future human motions based on the observed ones. Currently, a prediction model always includes an encoder to extract the features of the observed motions and a decoder for motion prediction with the extracted features. However, existing encoders have limited abilities of anti-interference and semantic attribute learning; thus, the prediction results are not fully satisfactory. To solve this problem, we propose two novel pretext tasks, i.e., noise-free motion reconstruction and semantic-aware contrastive learning, to implement self-supervised pretraining on the encoder. The former extracts the features of a noise-added motion to reconstruct a noise-free motion. This pretext task can improve the ability of anti-interference of the encoder. The latter shortens the distance between two motions with the same motion category while widening the distance between two motions from different motion categories. This pretext task enables the encoder to capture the semantic commonality of the motions with the same motion category, thereby enhancing the encoder's ability of semantic attribute learning. After implementing the self-supervised pretraining, the pretrained encoder is connected with a decoder to build a prediction model called ReLe-GCN, which is fine-tuned for human motion prediction. The results on public datasets show the high accuracies of ReLe-GCN in both short-term and long-term motion predictions.

引用

页码：738 / 751

页数：14

共 56 条

[1]

Bernhard S., 2007, Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, P1345

[2]

Chen T, 2020, PR MACH LEARN RES, V119

[3] Action-Agnostic Human Pose Forecasting [J].

Chiu, Hsu-kuang ;

Adeli, Ehsan ;

Wang, Borui ;

Huang, De-An ;

Niebles, Juan Carlos .

2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1423-1432

[4] Efficient human motion prediction using temporal convolutional generative adversarial network [J].

Cui, Qiongjie ;

Sun, Huaijiang ;

Kong, Yue ;

Zhang, Xiaoqian ;

Li, Yanmeng .

INFORMATION SCIENCES, 2021, 545 :427-447

[5] Learning Dynamic Relationships for 3D Human Motion Prediction [J].

Cui, Qiongjie ;

Sun, Huaijiang ;

Yang, Fei .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6518-6526

[6] MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction [J].

Dang, Lingwei ;

Nie, Yongwei ;

Long, Chengjiang ;

Zhang, Qing ;

Li, Guiqing .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11447-11456

[7] Towards More Realistic Human Motion Prediction With Attention to Motion Coordination [J].

Ding, Pengxiang ;

Yin, Jianqin .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) :5846-5858

[8] Unsupervised Visual Representation Learning by Context Prediction [J].

Doersch, Carl ;

Gupta, Abhinav ;

Efros, Alexei A. .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1422-1430

[9] Recurrent Network Models for Human Dynamics [J].

Fragkiadaki, Katerina ;

Levine, Sergey ;

Felsen, Panna ;

Malik, Jitendra .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4346-4354

[10]

Gidaris S., 2018, ARXIV

← 1 2 3 4 5 6 →