Pyramid Deep Fusion Network for Two-Hand Reconstruction From RGB-D Images

被引：0

作者：

Ren, Jinwei ^{[1
]}

Zhu, Jianke ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

关键词：

Three-dimensional displays; Feature extraction; Image reconstruction; Point cloud compression; Shape; Color; Solid modeling; RGB-D fusion; 3D reconstruction; hand pose; end-to-end network; HAND POSE ESTIMATION; REGRESSION;

D O I：

10.1109/TCSVT.2024.3369646

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Accurately recovering the dense 3D mesh of both hands from monocular images poses considerable challenges due to occlusions and projection ambiguity. Most of the existing methods extract features from color images to estimate the root-aligned hand meshes, which neglect the crucial depth and scale information in the real world. Given the noisy sensor measurements with limited resolution, depth-based methods predict 3D keypoints rather than a dense mesh. These limitations motivate us to take advantage of these two complementary inputs to acquire dense hand meshes on a real-world scale. In this work, we propose an end-to-end framework for recovering dense meshes for both hands, which employ single-view RGB-D image pairs as input. The primary challenge lies in effectively utilizing two different input modalities to mitigate the blurring effects in RGB images and noises in depth images. Instead of directly treating depth maps as additional channels for RGB images, we encode the depth information into the unordered point cloud to preserve more geometric details. Specifically, our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud, respectively. Additionally, we introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales, which demonstrates superior efficacy compared to previous fusion strategies. Furthermore, we employ a GCN-based decoder to process the fused features and recover the corresponding 3D pose and dense mesh. Through comprehensive ablation experiments, we have not only demonstrated the effectiveness of our proposed fusion algorithm but also outperformed the state-of-the-art approaches on publicly available datasets. To reproduce the results, we will make our source code and models publicly available at https://github.com/zijinxuxu/PDFNet.

引用

页码：5843 / 5855

页数：13

共 78 条

[1] Pushing the Envelope for RGB-based Dense 3D Hand Pose Estimation via Neural Rendering [J].

Baek, Seungryul ;

Kim, Kwang In ;

Kim, Tae-Kyun .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1067-1076

[2] 3D Hand Shape and Pose from Images in the Wild [J].

Boukhayma, Adnane ;

de Bem, Rodrigo ;

Torr, Philip H. S. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10835-10844

[3] 3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images [J].

Cai, Yujun ;

Ge, Liuhao ;

Cai, Jianfei ;

Thalmann, Nadia Magnenat ;

Yuan, Junsong .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) :3739-3753

[4] Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images [J].

Cai, Yujun ;

Ge, Liuhao ;

Cai, Jianfei ;

Yuan, Junsong .

COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :678-694

[5] Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation [J].

Chen, Lin-Zhuo ;

Lin, Zheng ;

Wang, Ziqin ;

Yang, Yong-Liang ;

Cheng, Ming-Ming .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2313-2324

[6] SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds [J].

Chen, Xinghao ;

Wang, Guijin ;

Zhang, Cairong ;

Kim, Tae-Kyun ;

Ji, Xiangyang .

IEEE ACCESS, 2018, 6 :43425-43439

[7] Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration [J].

Chen, Xingyu ;

Liu, Yufeng ;

Ma, Chongyang ;

Chang, Jianlong ;

Wang, Huayan ;

Chen, Tian ;

Guo, Xiaoyan ;

Wan, Pengfei ;

Zheng, Wen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13269-13278

[8] SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation with Semi-supervised Learning [J].

Chen, Yujin ;

Tu, Zhigang ;

Ge, Liuhao ;

Zhang, Dejun ;

Chen, Ruizhi ;

Yuan, Junsong .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6960-6969

[9] Survey on 3D Hand Gesture Recognition [J].

Cheng, Hong ;

Yang, Lu ;

Liu, Zicheng .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (09) :1659-1673

[10]

Defferrard M, 2016, ADV NEUR IN, V29

← 1 2 3 4 5 6 7 8 →