G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features

被引:104
作者
Chen, Wei [1 ,2 ]
Jia, Xi [1 ]
Chang, Hyung Jin [1 ]
Duan, Jinming [1 ]
Leonardis, Ales [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham, W Midlands, England
[2] Natl Univ Def Technol, Sch Comp Sci, Changsha, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年
基金
英国工程与自然科学研究理事会;
关键词
3D OBJECT DETECTION; CLASSIFICATION;
D O I
10.1109/CVPR42600.2020.00429
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper; we propose a novel real-time 6D object pose estimation framework, named G2L-Net. Our network operates on point clouds from RGB-D detection in a divide-and-conquer fashion. Specifically, our network consists of three steps. First, we extract the coarse object point cloud from the RGB-D image by 2D detection. Second, we feed the coarse object point cloud to a translation localization network to perform 3D segmentation and object translation prediction. Third, via the predicted segmentation and translation, we transfer the fine object point cloud into a local canonical coordinate, in which we train a rotation localization network to estimate initial object rotation. In the third step, we define point-wise embedding vector features to capture viewpoint-aware information. To calculate more accurate rotation, we adopt a rotation residual estimator to estimate the residual between initial rotation and ground truth, which can boost initial pose estimation performance. Our proposed G2L-Net is real-time despite the fact multiple steps are stacked via the proposed coarse-to-fine framework. Extensive experiments on two benchmark datasets show that G2L-Net achieves state-of-the-art performance in terms of both accuracy and speed.(1)
引用
收藏
页码:4232 / 4241
页数:10
相关论文
共 47 条
[1]  
BESL PJ, 1992, P SOC PHOTO-OPT INS, V1611, P586, DOI 10.1117/12.57955
[2]  
Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35
[3]  
Calli Berk, 2015, ARXIV PREPRINT ARXIV
[4]  
Chen W, 2016, INT C PATT RECOG, P1821, DOI 10.1109/ICPR.2016.7899901
[5]  
Chen Wei, 2020, IEEE WINT C APPL COM
[6]   OBJECT MODELING BY REGISTRATION OF MULTIPLE RANGE IMAGES [J].
CHEN, Y ;
MEDIONI, G .
IMAGE AND VISION COMPUTING, 1992, 10 (03) :145-155
[7]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[8]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]   Complete solution classification for the Perspective-Three-Point problem [J].
Gao, XS ;
Hou, XR ;
Tang, JL ;
Cheng, HF .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (08) :930-943