JGR-P2O: Joint Graph Reasoning Based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

被引:31
作者
Fang, Linpu [1 ]
Liu, Xingyan [1 ]
Liu, Li [2 ]
Xu, Hang [3 ]
Kang, Wenxiong [1 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Univ Oulu, Ctr Machine Vis & Signal Anal, Oulu, Finland
[3] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
来源
COMPUTER VISION - ECCV 2020, PT VI | 2020年 / 12351卷
关键词
3D hand pose estimation; Depth image; Graph neural network; REGRESSION;
D O I
10.1007/978-3-030-58539-6_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions, including voxel-tovoxel predictions, point-to-point regression, and pixel-wise estimations. Despite the good performance, those methods have a few issues in nature, such as the poor trade-off between accuracy and efficiency, and plain feature representation learning with local convolutions. In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues. The key ideas are two-fold: (a) explicitly modeling the dependencies among joints and the relations between the pixels and the joints for better local feature representation learning; (b) unifying the dense pixel-wise offset predictions and direct joint regression for end-to-end training. Specifically, we first propose a graph convolutional network (GCN) based joint graph reasoning module to model the complex dependencies among joints and augment the representation capability of each pixel. Then we densely estimate all pixels' offsets to joints in both image plane and depth space and calculate the joints' positions by a weighted average over all pixels' predictions, totally discarding the complex postprocessing operations. The proposed model is implemented with an efficient 2D fully convolutional network (FCN) backbone and has only about 1.4M parameters. Extensive experiments on multiple 3D hand pose estimation benchmarks demonstrate that the proposed method achieves new state-of-the-art accuracy while running very efficiently with around a speed of 110 fps on a single NVIDIA 1080Ti GPU (This work was supported in part by the National Natural Science Foundation of China under Grants 61976095, in part by the Science and Technology Planning Project of Guangdong Province under Grant 2018B030323026. This work was also partially supported by the Academy of Finland.). The code is available at https://github.com/fanglinpu/JGR-P2O.
引用
收藏
页码:120 / 137
页数:18
相关论文
共 50 条
[1]   Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks [J].
Cai, Yujun ;
Ge, Liuhao ;
Liu, Jun ;
Cai, Jianfei ;
Cham, Tat-Jen ;
Yuan, Junsong ;
Thalmann, Nadia Magnenat .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2272-2281
[2]   Pose guided structured region ensemble network for cascaded hand pose estimation [J].
Chen, Xinghao ;
Wang, Guijin ;
Guo, Hengkai ;
Zhang, Cairong .
NEUROCOMPUTING, 2020, 395 :138-149
[3]   SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds [J].
Chen, Xinghao ;
Wang, Guijin ;
Zhang, Cairong ;
Kim, Tae-Kyun ;
Ji, Xiangyang .
IEEE ACCESS, 2018, 6 :43425-43439
[4]   Multi-Context Attention for Human Pose Estimation [J].
Chu, Xiao ;
Yang, Wei ;
Ouyang, Wanli ;
Ma, Cheng ;
Yuille, Alan L. ;
Wang, Xiaogang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5669-5678
[5]  
Defferrard M, 2016, ADV NEUR IN, V29
[6]   CrosslnfoNet: Multi-Task Information Sharing Based Hand Pose Estimation [J].
Du, Kuo ;
Lin, Xiangbo ;
Sun, Yi ;
Ma, Xiaohong .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9888-9897
[7]   Point-to-Point Regression PointNet for 3D Hand Pose Estimation [J].
Ge, Liuhao ;
Ren, Zhou ;
Yuan, Junsong .
COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 :489-505
[8]   Hand PointNet: 3D Hand Pose Estimation using Point Sets [J].
Ge, Liuhao ;
Cai, Yujun ;
Weng, Junwu ;
Yuan, Junsong .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8417-8426
[9]   3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images [J].
Ge, Liuhao ;
Liang, Hui ;
Yuan, Junsong ;
Thalmann, Daniel .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5679-5688
[10]   Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs [J].
Ge, Liuhao ;
Liang, Hui ;
Yuan, Junsong ;
Thalmann, Daniel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3593-3601