Global hand pose estimation based on pixel voting

被引:0
|
作者
Lin J. [1 ]
Li D. [1 ]
Chen C. [1 ]
Zhao Z. [1 ]
机构
[1] School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao
来源
Guangxue Jingmi Gongcheng/Optics and Precision Engineering | 2022年 / 30卷 / 19期
关键词
Convolutional Neural Network(CNN); deep learning; depth image; hand pose estimation; synthetic dataset;
D O I
10.37188/OPE.20223019.2379
中图分类号
学科分类号
摘要
Global hand pose estimation under changing gestures remains a challenging task in computer vision. To address the problem of large errors in this task, a method based on pixel voting was proposed. First, a convolutional neural network with an encoder-decoder structure was established to generate feature maps of semantic and pose information. Second, hand pixel positions and pixel-by-pixel pose voting were obtained from the feature maps using semantic segmentation and pose estimation branches, respectively. Finally, the pose voting of hand pixels was aggregated to obtain the voting result. Simultaneously, to solve the problem of scarcity of global hand pose datasets, a procedure for generating synthetic datasets of the human hand was established using the OpenSceneGraph 3D rendering engine and a 3D human hand model. This procedure could generate depth images and global pose labels of human hands under different gestures. Experimental results show that the average error of global hand pose estimation based on pixel voting is 5.036°, thus verifying that the proposed method can robustly and accurately estimate global hand poses from depth images. © 2022 Guangxue Jingmi Gongcheng/Optics and Precision Engineering. All rights reserved.
引用
收藏
页码:2379 / 2389
页数:10
相关论文
共 27 条
  • [1] TKACH A, TAGLIASACCHI A, REMELLI E, Online generative model personalization for hand tracking[J], ACM Transactions on Graphics, 36, 6, (2017)
  • [2] LI D N, ZHOU Y Q, Three dimensional hand tracking by improved particle swarm optimized particle filter[J], Opt. Precision Eng, 22, 10, pp. 2870-2878, (2014)
  • [3] TANG X, WANG T Y, FU C W, Towards accurate alignment in real-time 3D hand-mesh reconstruction[C], 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11678-11687, (2021)
  • [4] XIONG F, ZHANG B S, XIAO Y, A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image[C], 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 793-802, (2019)
  • [5] TAYLOR J, TANKOVICH V, TANG D H, Articulated distance fields for ultra-fast tracking of hands interacting[J], ACM Transactions on Graphics, 36, 6, (2017)
  • [6] KREJOV P, GILBERT A, BOWDEN R, Guided optimisation through classification and regression for hand pose estimation[J], Computer Vision and Image Understanding, 155, pp. 124-138, (2017)
  • [7] CHEN X H, WANG G J, GUO H K, Pose guided structured region ensemble network for cascaded hand pose estimation[J], Neurocomputing, 395, pp. 138-149, (2020)
  • [8] CHENG W C, PARK J H, KO J H, HandFoldingNet: a 3D hand pose estimation network using multiscale-feature guided folding of a 2D hand skeleton[C], 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11240-11249, (2021)
  • [9] REN P F, SUN H F, HUANG W T, Spatial-aware stacked regression network for real-time 3D hand pose estimation[J], Neurocomputing, 437, pp. 42-57, (2021)
  • [10] LI W Q, LEI H, ZHANG J Y, Et al., 3D hand pose estimation based on label distribution learning[J], Journal of Computer Applications, 41, 2, pp. 550-555, (2021)