3D hand pose and shape estimation from monocular RGB via efficient 2D cues

被引:2
|
作者
Zhang, Fenghao [1 ]
Zhao, Lin [2 ]
Li, Shengling [1 ]
Su, Wanjuan [2 ]
Liu, Liman [1 ]
Tao, Wenbing [2 ]
机构
[1] South Cent Minzu Univ, Sch Biomed Engn, Hubei Key Lab Med Informat Anal & Tumor Diag & Tre, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Natl Key Lab Sci & Technol Multispectral Informat, Wuhan 430074, Peoples R China
来源
COMPUTATIONAL VISUAL MEDIA | 2024年 / 10卷 / 01期
基金
中国国家自然科学基金;
关键词
hand; 3D reconstruction; deep learning; image features; 3D mesh;
D O I
10.1007/s41095-023-0346-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.
引用
收藏
页码:79 / 96
页数:18
相关论文
共 50 条
  • [41] Occlusion-Robust 3D Hand Pose Estimation from a Single RGB Image
    Ishii, Asuka
    Nakano, Gaku
    Inoshita, Tetsuo
    PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA 2021), 2021,
  • [42] SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
    Xu, Xiangyu
    Liu, Lijuan
    Yan, Shuicheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3275 - 3289
  • [43] Personalized Graph Generation for Monocular 3D Human Pose and Shape Estimation
    Hu, Junxing
    Zhang, Hongwen
    Wang, Yunlong
    Ren, Min
    Sun, Zhenan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2399 - 2413
  • [44] Efficient 2D Keypoint-based Hand Pose Estimation
    Hsiao, Shan-Chien
    Chiu, Ching-Te
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 1648 - 1652
  • [45] Recent Developments on 2D Pose Estimation From Monocular Images
    Bak, Artur
    Kulbacki, Marek
    Segen, Jakub
    Swiatkowski, Dawid
    Wereszczynski, Kamil
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT II, 2016, 9622 : 437 - 446
  • [46] 3D Hand Shape and Pose from Images in the Wild
    Boukhayma, Adnane
    de Bem, Rodrigo
    Torr, Philip H. S.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10835 - 10844
  • [47] A survey on monocular 3D human pose estimation
    Ji X.
    Fang Q.
    Dong J.
    Shuai Q.
    Jiang W.
    Zhou X.
    Virtual Reality and Intelligent Hardware, 2020, 2 (06): : 471 - 500
  • [48] MONOCULAR 3D HUMAN POSE ESTIMATION BY CLASSIFICATION
    Greif, Thomas
    Lienhart, Rainer
    Sengupta, Debabrata
    2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
  • [49] Monocular 3D Pose Estimation and Tracking by Detection
    Andriluka, Mykhaylo
    Roth, Stefan
    Schiele, Bernt
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 623 - 630
  • [50] AUTOMATIC 3D CHARACTER RECONSTRUCTION FROM FRONTAL AND LATERAL MONOCULAR 2D RGB VIEWS
    Beacco, Alejandro
    Gallego, Jaime
    Slater, Mel
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2785 - 2789