3D hand pose and shape estimation from monocular RGB via efficient 2D cues

被引:2
作者
Zhang, Fenghao [1 ]
Zhao, Lin [2 ]
Li, Shengling [1 ]
Su, Wanjuan [2 ]
Liu, Liman [1 ]
Tao, Wenbing [2 ]
机构
[1] South Cent Minzu Univ, Sch Biomed Engn, Hubei Key Lab Med Informat Anal & Tumor Diag & Tre, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Natl Key Lab Sci & Technol Multispectral Informat, Wuhan 430074, Peoples R China
来源
COMPUTATIONAL VISUAL MEDIA | 2024年 / 10卷 / 01期
基金
中国国家自然科学基金;
关键词
hand; 3D reconstruction; deep learning; image features; 3D mesh;
D O I
10.1007/s41095-023-0346-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.
引用
收藏
页码:79 / 96
页数:18
相关论文
共 59 条
  • [31] End-to-End Human Pose and Mesh Reconstruction with Transformers
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1954 - 1963
  • [32] SMPL: A Skinned Multi-Person Linear Model
    Loper, Matthew
    Mahmood, Naureen
    Romero, Javier
    Pons-Moll, Gerard
    Black, Michael J.
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
  • [33] HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation from a Single Depth Map
    Malik, Jameel
    Abdelaziz, Ibrahim
    Elhayek, Ahmed
    Shimada, Soshi
    Ali, Sk Aziz
    Golyanik, Vladislav
    Theobalt, Christian
    Stricker, Didier
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 7111 - 7120
  • [34] V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
    Moon, Gyeongsik
    Chang, Ju Yong
    Lee, Kyoung Mu
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5079 - 5088
  • [35] GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB
    Mueller, Franziska
    Bernard, Florian
    Sotnychenko, Oleksandr
    Mehta, Dushyant
    Sridhar, Srinath
    Casas, Dan
    Theobalt, Christian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 49 - 59
  • [36] Nair V., 2010, INT C MACH LEARN HAI, P807
  • [37] Stacked Hourglass Networks for Human Pose Estimation
    Newell, Alejandro
    Yang, Kaiyu
    Deng, Jia
    [J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 483 - 499
  • [38] Paszke A, 2019, ADV NEUR IN, V32
  • [39] Learning to Estimate 3D Human Pose and Shape from a Single Color Image
    Pavlakos, Georgios
    Zhu, Luyang
    Zhou, Xiaowei
    Daniilidis, Kostas
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 459 - 468
  • [40] Piumsomboon T, 2013, LECT NOTES COMPUT SC, V8118, P282