Self-Supervised Hand Pose Estimation with Regional Depth Correspondence

被引:0
作者
Wang J.-Y. [1 ]
Huang W.-T. [1 ]
Liu C. [2 ]
Qi Q. [1 ]
Sun H.-F. [1 ]
Liao J.-X. [1 ]
机构
[1] State key laboratory of Networking and Switching Techonology, Beijing University of Posts and Telecommunications, Beijing
[2] China Mobile Group Design Institute Co.,Ltd., Beijing
来源
Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2023年 / 51卷 / 06期
基金
中国国家自然科学基金;
关键词
deep learning; depth images; hand pose estimation; regional consistency; self-supervised;
D O I
10.12263/DZXB.20210648
中图分类号
TB18 [人体工程学]; Q98 [人类学];
学科分类号
030303 ; 1201 ;
摘要
Depth-based 3D hand pose estimation requires manually labelled data to achieve high accuracy and robustness. However, the labeling process is laborsome and bares inevitable biases. Researchers solve this problem by using self-supervised methods. They pretrain model on synthetic dataset then finetune on unlabelled real dataset through model fitting. The biggest challenge is the design of model fitting term in fintuning stage to prevent severe accuracy drop. We proposed the regional depth correspondence loss which utilized initial pose estimation results to extract regional representation of input and output depth maps and transparently divided them into different regions. This allows network to finetune regions around joints without being affected by overall domain gaps between synthetic and real depth images. The proposed method outperforms baseline method by 21.9% on NYU hand pose dataset. © 2023 Chinese Institute of Electronics. All rights reserved.
引用
收藏
页码:1644 / 1653
页数:9
相关论文
共 27 条
[1]  
REN H B, ZHU Y X, XU G Y, Et al., Vision-based recognition of hand gestures: A survey, Acta Electronica Sinica, 28, 2, pp. 118-121, (2000)
[2]  
GUAN Y P., Pointing user recognition in human-computer interaction with cluttered scene, Acta Electronica Sinica, 42, 11, pp. 2135-2141, (2014)
[3]  
XU Yi-hua, LI Shan-qing, JIA Yun-de, A vision-based method for finger-screen interaction, Acta Electronica Sinica, 35, 11, pp. 2236-2240, (2007)
[4]  
WU HUI-YUE, WANG JIAN-MIN, DAI GUO-ZHONG, Personalized interaction techniques of vision-based 3D dynamic gestures based on small sample learning, Acta Electronica Sinica, 41, 11, pp. 2230-2236, (2013)
[5]  
CUI J, KUIJPER A, SOURIN A., Exploration of natural free-hand interaction for shape modeling using leap motion controller, Proceedings of the International Conference on Cyberworlds(CW), pp. 41-48, (2016)
[6]  
QI J, XU K, DING X L., Vision-based hand gesture recognition for human-robot interaction: A review, Robot, 39, 4, pp. 565-584, (2017)
[7]  
WAN C D, PROBST T, GOOL L V, Et al., Dense 3d regression for hand pose estimation, Computer Vision and Pattern Recognition (CVPR), pp. 5147-5156, (2018)
[8]  
HUANG W T, REN P F, WANG J Y, Et al., Awr: adaptive weighting regression for 3d hand pose estimation, Association for the Advancement of Artificial Intelligence (AAAI), pp. 11061-11068, (2020)
[9]  
CHEN Y J, TU Z G, GE L H, Et al., SO-handnet: self-organizing network for 3d hand pose estimation with semi-supervised learning, International Conference on Computer Vision (ICCV), pp. 6960-6969, (2019)
[10]  
GE L H, LIANG H, YUAN J S, Et al., Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns, Computer Vision and Pattern Recognition (CVPR), pp. 3593-3601, (2016)