3D hand pose and shape estimation from monocular RGB via efficient 2D cues

被引:2
|
作者
Zhang, Fenghao [1 ]
Zhao, Lin [2 ]
Li, Shengling [1 ]
Su, Wanjuan [2 ]
Liu, Liman [1 ]
Tao, Wenbing [2 ]
机构
[1] South Cent Minzu Univ, Sch Biomed Engn, Hubei Key Lab Med Informat Anal & Tumor Diag & Tre, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Natl Key Lab Sci & Technol Multispectral Informat, Wuhan 430074, Peoples R China
来源
COMPUTATIONAL VISUAL MEDIA | 2024年 / 10卷 / 01期
基金
中国国家自然科学基金;
关键词
hand; 3D reconstruction; deep learning; image features; 3D mesh;
D O I
10.1007/s41095-023-0346-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.
引用
收藏
页码:79 / 96
页数:18
相关论文
共 50 条
  • [1] 3D hand pose and shape estimation from monocular RGB via efficient 2D cues
    Fenghao Zhang
    Lin Zhao
    Shengling Li
    Wanjuan Su
    Liman Liu
    Wenbing Tao
    Computational Visual Media, 2024, 10 : 79 - 96
  • [2] Multiple-Hand 2D Pose Estimation From a Monocular RGB Image
    Mishra, Purnendu
    Sarawadekar, Kishor
    IEEE ACCESS, 2024, 12 : 40722 - 40735
  • [3] 3D Hand Pose Estimation From Monocular RGB With Feature Interaction Module
    Guo, Shaoxiang
    Rigall, Eric
    Ju, Yakun
    Dong, Junyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5293 - 5306
  • [4] 3D Hand Pose Estimation from Monocular RGB with Feature Interaction Module
    Guo, Shaoxiang
    Rigall, Eric
    Ju, Yakun
    Dong, Junyu
    IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (08): : 5293 - 5306
  • [5] 3D Hand Shape and Pose Estimation based on 2D Hand Keypoints
    Drosakis, Drosakis
    Argyros, Antonis
    PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2023, 2023, : 148 - 153
  • [6] 3D Hand Shape and Pose Estimation from a Single RGB Image
    Ge, Liuhao
    Ren, Zhou
    Li, Yuncheng
    Xue, Zehao
    Wang, Yingying
    Cai, Jianfei
    Yuan, Junsong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10825 - 10834
  • [7] Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation
    Tekin, Bugra
    Marquez-Neila, Pablo
    Salzmann, Mathieu
    Fua, Pascal
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3961 - 3970
  • [8] Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning
    Spurr, Adrian
    Dahiya, Aneesh
    Wang, Xi
    Zhang, Xucong
    Hilliges, Otmar
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11210 - 11219
  • [9] Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images
    Cai, Yujun
    Ge, Liuhao
    Cai, Jianfei
    Yuan, Junsong
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 678 - 694
  • [10] 3D interacting hand pose and shape estimation from a single RGB image
    Gao, Chengying
    Yang, Yujia
    Li, Wensheng
    NEUROCOMPUTING, 2022, 474 : 25 - 36