G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features

被引:93
作者
Chen, Wei [1 ,2 ]
Jia, Xi [1 ]
Chang, Hyung Jin [1 ]
Duan, Jinming [1 ]
Leonardis, Ales [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham, W Midlands, England
[2] Natl Univ Def Technol, Sch Comp Sci, Changsha, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年
基金
英国工程与自然科学研究理事会;
关键词
3D OBJECT DETECTION; CLASSIFICATION;
D O I
10.1109/CVPR42600.2020.00429
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper; we propose a novel real-time 6D object pose estimation framework, named G2L-Net. Our network operates on point clouds from RGB-D detection in a divide-and-conquer fashion. Specifically, our network consists of three steps. First, we extract the coarse object point cloud from the RGB-D image by 2D detection. Second, we feed the coarse object point cloud to a translation localization network to perform 3D segmentation and object translation prediction. Third, via the predicted segmentation and translation, we transfer the fine object point cloud into a local canonical coordinate, in which we train a rotation localization network to estimate initial object rotation. In the third step, we define point-wise embedding vector features to capture viewpoint-aware information. To calculate more accurate rotation, we adopt a rotation residual estimator to estimate the residual between initial rotation and ground truth, which can boost initial pose estimation performance. Our proposed G2L-Net is real-time despite the fact multiple steps are stacked via the proposed coarse-to-fine framework. Extensive experiments on two benchmark datasets show that G2L-Net achieves state-of-the-art performance in terms of both accuracy and speed.(1)
引用
收藏
页码:4232 / 4241
页数:10
相关论文
共 47 条
  • [1] BESL PJ, 1992, P SOC PHOTO-OPT INS, V1611, P586, DOI 10.1117/12.57955
  • [2] Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35
  • [3] Chen W, 2016, INT C PATT RECOG, P1821, DOI 10.1109/ICPR.2016.7899901
  • [4] Chen Wei, 2020, IEEE WINT C APPL COM
  • [5] OBJECT MODELING BY REGISTRATION OF MULTIPLE RANGE IMAGES
    CHEN, Y
    MEDIONI, G
    [J]. IMAGE AND VISION COMPUTING, 1992, 10 (03) : 145 - 155
  • [6] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
    Dai, Angela
    Qi, Charles Ruizhongtai
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
  • [7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] Complete solution classification for the Perspective-Three-Point problem
    Gao, XS
    Hou, XR
    Tang, JL
    Cheng, HF
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (08) : 930 - 943
  • [10] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]