Contrastive Representation Learning for Hand Shape Estimation

被引:11
作者
Zimmermann, Christian [1 ]
Argus, Max [1 ]
Brox, Thomas [1 ]
机构
[1] Univ Freiburg, Freiburg, Germany
来源
PATTERN RECOGNITION, DAGM GCPR 2021 | 2021年 / 13024卷
关键词
Hand shape estimation; Self-supervised learning; Contrastive learning; Dataset;
D O I
10.1007/978-3-030-92659-5_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work presents improvements in monocular hand shape estimation by building on top of recent advances in unsupervised learning. We extend momentum contrastive learning and contribute a structured collection of hand images, well suited for visual representation learning, which we call HanCo. We find that the representation learned by established contrastive learning methods can be improved significantly by exploiting advanced background removal techniques and multi-view information. These allow us to generate more diverse instance pairs than those obtained by augmentations commonly used in exemplar based approaches. Our method leads to a more suitable representation for the hand shape estimation task and shows a 4.7% reduction in mesh error and a 3.6% improvement in F-score compared to an ImageNet pretrained baseline. We make our benchmark dataset publicly available, to encourage further research into this direction.
引用
收藏
页码:250 / 264
页数:15
相关论文
共 42 条
[1]  
[Anonymous], 2006, PROC IEEE COMPUT SOC, DOI 10.1109/CVPR.2006.100
[2]   Pushing the Envelope for RGB-based Dense 3D Hand Pose Estimation via Neural Rendering [J].
Baek, Seungryul ;
Kim, Kwang In ;
Kim, Tae-Kyun .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1067-1076
[3]   3D Hand Shape and Pose from Images in the Wild [J].
Boukhayma, Adnane ;
de Bem, Rodrigo ;
Torr, Philip H. S. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10835-10844
[4]   Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images [J].
Cai, Yujun ;
Ge, Liuhao ;
Cai, Jianfei ;
Yuan, Junsong .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :678-694
[5]  
Caron M, 2021, Arxiv, DOI arXiv:2006.09882
[6]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[7]  
Chen T, 2020, Arxiv, DOI [arXiv:2006.10029, 10.48550/arXiv.2006.10029]
[8]  
Chen XL, 2020, Arxiv, DOI arXiv:2003.04297
[9]  
Donahue J, 2014, PR MACH LEARN RES, V32
[10]   First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations [J].
Garcia-Hernando, Guillermo ;
Yuan, Shanxin ;
Baek, Seungryul ;
Kim, Tae-Kyun .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :409-419