Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning

被引:27
|
作者
Spurr, Adrian [1 ]
Dahiya, Aneesh [1 ]
Wang, Xi [1 ]
Zhang, Xucong [1 ]
Hilliges, Otmar [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.01104
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Encouraged by the success of contrastive learning on image classification tasks, we propose a new self-supervised method for the structured regression task of 3D hand pose estimation. Contrastive learning makes use of unlabeled data for the purpose of representation learning via a loss formulation that encourages the learned feature representations to be invariant under any image transformation. For 3D hand pose estimation, it too is desirable to have invariance to appearance transformation such as color jitter. However, the task requires equivariance under affine transformations, such as rotation and translation. To address this issue, we propose an equivariant contrastive objective and demonstrate its effectiveness in the context of 3D hand pose estimation. We experimentally investigate the impact of invariant and equivariant contrastive objectives and show that learning equivariant features leads to better representations for the task of 3D hand pose estimation. Furthermore, we show that standard ResNets with sufficient depth, trained on additional unlabeled data, attain improvements of up to 14.5% in PA-EPE on FreiHAND and thus achieves state-of-the-art performance without any task specific, specialized architectures. Code and models are available at https://ait.ethz.ch/projects/2021/PeCLR/
引用
收藏
页码:11210 / 11219
页数:10
相关论文
共 50 条
  • [1] Graph-Based CNNs With Self-Supervised Module for 3D Hand Pose Estimation From Monocular RGB
    Guo, Shaoxiang
    Rigall, Eric
    Qi, Lin
    Dong, Xinghui
    Li, Haiyan
    Dong, Junyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1514 - 1525
  • [2] Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images
    Cai, Yujun
    Ge, Liuhao
    Cai, Jianfei
    Yuan, Junsong
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 678 - 694
  • [3] CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild
    Wandt, Bastian
    Rudolph, Marco
    Zell, Petrissa
    Rhodin, Helge
    Rosenhahn, Bodo
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13289 - 13299
  • [4] Self-supervised 3D hand pose estimation through training by fitting
    Wan, Chengde
    Probst, Thomas
    Van Gool, Luc
    Yao, Angela
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10845 - 10854
  • [5] MAPConNet: Self-supervised 3D Pose Transfer with Mesh and Point Contrastive Learning
    Sun, Jiaze
    Chen, Zhixiang
    Kim, Tae-Kyun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14406 - 14416
  • [6] 3D Hand Pose Estimation From Monocular RGB With Feature Interaction Module
    Guo, Shaoxiang
    Rigall, Eric
    Ju, Yakun
    Dong, Junyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5293 - 5306
  • [7] 3D Hand Pose Estimation from Monocular RGB with Feature Interaction Module
    Guo, Shaoxiang
    Rigall, Eric
    Ju, Yakun
    Dong, Junyu
    IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (08): : 5293 - 5306
  • [8] Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh Estimation in Videos
    Chen, Liangjian
    Lin, Shih-Yao
    Xie, Yusheng
    Lin, Yen-Yu
    Xie, Xiaohui
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1049 - 1058
  • [9] Self-supervised 3D human pose estimation from video
    Gholami, Mohsen
    Rezaei, Ahmad
    Rhodin, Helge
    Ward, Rabab
    Wang, Z. Jane
    NEUROCOMPUTING, 2022, 488 : 97 - 106
  • [10] 3D hand pose and shape estimation from monocular RGB via efficient 2D cues
    Fenghao Zhang
    Lin Zhao
    Shengling Li
    Wanjuan Su
    Liman Liu
    Wenbing Tao
    Computational Visual Media, 2024, 10 : 79 - 96