Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning

被引：27

作者：

Spurr, Adrian ^{[1
]}

Dahiya, Aneesh ^{[1
]}

Wang, Xi ^{[1
]}

Zhang, Xucong ^{[1
]}

Hilliges, Otmar ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.01104

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Encouraged by the success of contrastive learning on image classification tasks, we propose a new self-supervised method for the structured regression task of 3D hand pose estimation. Contrastive learning makes use of unlabeled data for the purpose of representation learning via a loss formulation that encourages the learned feature representations to be invariant under any image transformation. For 3D hand pose estimation, it too is desirable to have invariance to appearance transformation such as color jitter. However, the task requires equivariance under affine transformations, such as rotation and translation. To address this issue, we propose an equivariant contrastive objective and demonstrate its effectiveness in the context of 3D hand pose estimation. We experimentally investigate the impact of invariant and equivariant contrastive objectives and show that learning equivariant features leads to better representations for the task of 3D hand pose estimation. Furthermore, we show that standard ResNets with sufficient depth, trained on additional unlabeled data, attain improvements of up to 14.5% in PA-EPE on FreiHAND and thus achieves state-of-the-art performance without any task specific, specialized architectures. Code and models are available at https://ait.ethz.ch/projects/2021/PeCLR/

引用

页码：11210 / 11219

页数：10

共 50 条

[1] Graph-Based CNNs With Self-Supervised Module for 3D Hand Pose Estimation From Monocular RGB
Guo, Shaoxiang
Rigall, Eric
Qi, Lin
Dong, Xinghui
Li, Haiyan
Dong, Junyu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1514 - 1525
[2] Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images
Cai, Yujun
Ge, Liuhao
Cai, Jianfei
Yuan, Junsong
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 678 - 694
[3] CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild
Wandt, Bastian
Rudolph, Marco
Zell, Petrissa
Rhodin, Helge
Rosenhahn, Bodo
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13289 - 13299
[4] Self-supervised 3D hand pose estimation through training by fitting
Wan, Chengde
Probst, Thomas
Van Gool, Luc
Yao, Angela
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10845 - 10854
[5] MAPConNet: Self-supervised 3D Pose Transfer with Mesh and Point Contrastive Learning
Sun, Jiaze
Chen, Zhixiang
Kim, Tae-Kyun
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14406 - 14416
[6] 3D Hand Pose Estimation From Monocular RGB With Feature Interaction Module
Guo, Shaoxiang
Rigall, Eric
Ju, Yakun
Dong, Junyu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5293 - 5306
[7] 3D Hand Pose Estimation from Monocular RGB with Feature Interaction Module
Guo, Shaoxiang
Rigall, Eric
Ju, Yakun
Dong, Junyu
IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (08): : 5293 - 5306
[8] Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh Estimation in Videos
Chen, Liangjian
Lin, Shih-Yao
Xie, Yusheng
Lin, Yen-Yu
Xie, Xiaohui
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1049 - 1058
[9] Self-supervised 3D human pose estimation from video
Gholami, Mohsen
Rezaei, Ahmad
Rhodin, Helge
Ward, Rabab
Wang, Z. Jane
NEUROCOMPUTING, 2022, 488 : 97 - 106
[10] 3D hand pose and shape estimation from monocular RGB via efficient 2D cues
Fenghao Zhang
Lin Zhao
Shengling Li
Wanjuan Su
Liman Liu
Wenbing Tao
Computational Visual Media, 2024, 10 : 79 - 96

← 1 2 3 4 5 →