3D hand pose and shape estimation from monocular RGB via efficient 2D cues

被引:2
|
作者
Zhang, Fenghao [1 ]
Zhao, Lin [2 ]
Li, Shengling [1 ]
Su, Wanjuan [2 ]
Liu, Liman [1 ]
Tao, Wenbing [2 ]
机构
[1] South Cent Minzu Univ, Sch Biomed Engn, Hubei Key Lab Med Informat Anal & Tumor Diag & Tre, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Natl Key Lab Sci & Technol Multispectral Informat, Wuhan 430074, Peoples R China
来源
COMPUTATIONAL VISUAL MEDIA | 2024年 / 10卷 / 01期
基金
中国国家自然科学基金;
关键词
hand; 3D reconstruction; deep learning; image features; 3D mesh;
D O I
10.1007/s41095-023-0346-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.
引用
收藏
页码:79 / 96
页数:18
相关论文
共 50 条
  • [31] 3D hand mesh reconstruction from a monocular RGB image
    Hao Peng
    Chuhua Xian
    Yunbo Zhang
    The Visual Computer, 2020, 36 : 2227 - 2239
  • [32] ShaRPy: Shape Reconstruction and Hand Pose Estimation from RGB-D with Uncertainty
    Wirth, Vanessa
    Liphardt, Anna-Maria
    Coppers, Birte
    Braeunig, Johanna
    Heinrich, Simon
    Leyendecker, Sigrid
    Kleyer, Arnd
    Schett, Georg
    Vossiek, Martin
    Egger, Bernhard
    Stamminger, Marc
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2617 - 2625
  • [33] 3D Human Pose Estimation=2D Pose Estimation plus Matching
    Chen, Ching-Hang
    Ramanan, Deva
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5759 - 5767
  • [34] 3D hand mesh reconstruction from a monocular RGB image
    Peng, Hao
    Xian, Chuhua
    Zhang, Yunbo
    VISUAL COMPUTER, 2020, 36 (10-12): : 2227 - 2239
  • [35] Context-Aware Network for 3D Human Pose Estimation from Monocular RGB Image
    Yin, Binyi
    Zhang, Dongbo
    Li, Shuai
    Hao, Aimin
    Qin, Hong
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [36] Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective
    Liu, Wu
    Bao, Qian
    Sun, Yu
    Mei, Tao
    ACM COMPUTING SURVEYS, 2023, 55 (04)
  • [37] Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB
    Mehta, Dushyant
    Sotnychenko, Oleksandr
    Mueller, Franziska
    Xu, Weipeng
    Sridhar, Srinath
    Pons-Moll, Gerard
    Theobalt, Christian
    2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 120 - 130
  • [38] Hand PointNet-based 3D Hand Pose Estimation in Egocentric RGB-D Images
    Le, Van-Hung
    Hoang, Van-Nam
    Vu, Hai
    Le, Thi-Lan
    Tran, Thanh-Hai
    Vu, Viet-Vu
    PROCEEDINGS OF 202013TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC 2020), 2020, : 215 - 220
  • [39] 3D Hand Pose Estimation from RGB Using Privileged Learning with Depth Data
    Yuan, Shanxin
    Stenger, Bjorn
    Kim, Tae-Kyun
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2866 - 2873
  • [40] 3D hand pose estimation from a single RGB image by weighting the occlusion and classification
    Mahdikhanlou, Khadijeh
    Ebrahimnezhad, Hossein
    PATTERN RECOGNITION, 2023, 136