Self-Supervised Hand Pose Estimation with Regional Depth Correspondence

被引：0

作者：

Wang J.-Y. ^{[1
]}

Huang W.-T. ^{[1
]}

Liu C. ^{[2
]}

Qi Q. ^{[1
]}

Sun H.-F. ^{[1
]}

Liao J.-X. ^{[1
]}

机构：

[1] State key laboratory of Networking and Switching Techonology, Beijing University of Posts and Telecommunications, Beijing

[2] China Mobile Group Design Institute Co.，Ltd., Beijing

来源：

Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2023年 / 51卷 / 06期

基金：

中国国家自然科学基金;

关键词：

deep learning; depth images; hand pose estimation; regional consistency; self-supervised;

D O I：

10.12263/DZXB.20210648

中图分类号：

TB18 [人体工程学]; Q98 [人类学];

学科分类号：

030303 ; 1201 ;

摘要：

Depth-based 3D hand pose estimation requires manually labelled data to achieve high accuracy and robustness. However, the labeling process is laborsome and bares inevitable biases. Researchers solve this problem by using self-supervised methods. They pretrain model on synthetic dataset then finetune on unlabelled real dataset through model fitting. The biggest challenge is the design of model fitting term in fintuning stage to prevent severe accuracy drop. We proposed the regional depth correspondence loss which utilized initial pose estimation results to extract regional representation of input and output depth maps and transparently divided them into different regions. This allows network to finetune regions around joints without being affected by overall domain gaps between synthetic and real depth images. The proposed method outperforms baseline method by 21.9% on NYU hand pose dataset. © 2023 Chinese Institute of Electronics. All rights reserved.

引用

页码：1644 / 1653

页数：9

共 27 条

[11]

MOON G, CHANG J Y, LEE K M., V2v-posenet: voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map, Computer Vision and Pattern Recognition (CVPR), pp. 5079-5088, (2018)

[12]

YUAN S X, YE Q, STENGER B, Et al., Bighand2.2m benchmark: hand pose dataset and state of the art analysis, Computer Vision and Pattern Recognition (CVPR), pp. 2605-2613, (2017)

[13]

TOMPSON J, STEIN M, YANN L C, Et al., Real-time continuous pose recovery of human hands using convolu-tional networks, ACM Transactions on Graphics (TOG), 169, 33, pp. 1-10, (2014)

[14]

WAN C D, PROBST T, GOOL L V, Et al., Dual grid net: Hand mesh vertex regression from single depth maps, European Conference on Computer Vision (ECCV), pp. 442-459, (2020)

[15]

DIBRA E, WOLF T, OZTIRELI C, Et al., How to refine 3d hand pose estimation from unlabelled depth data, International Conference on 3D Vision, pp. 135-144, (2017)

[16]

WAN CD, PROBST T, GOOL LV, Et al., Self-supervised 3d hand pose estimation through training by fitting, Computer Vision and Pattern Recognition (CVPR), pp. 10853-10862, (2019)

[17]

MELAX S, KESELMAN L, ORSTEN S., Dynamics based 3d skeletal hand tracking, Proceedings of Graphics Interface 2013, pp. 63-70, (2013)

[18]

SINHA A, CHOI C, RAMANI K., Deephand: Robust hand pose estimation by completing a matrix imputed with deep features, Computer Vision and Pattern Recognition (CVPR), pp. 4150-4158, (2016)

[19]

ZHANG H, BO Z H, YONG J H, Et al., InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions, ACM Transactions on Graphics, 38, 4, pp. 1-11, (2019)

[20]

SUPANCIC JS, ROGEZ G, YANG Y, Et al., Depth-based hand pose estimation: Methods, data, and challenges[J], International Journal of Computer Vision, 126, 11, pp. 1180-1198, (2018)

← 1 2 3 →