G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features

被引：104

作者：

Chen, Wei ^{[1
,2
]}

Jia, Xi ^{[1
]}

Chang, Hyung Jin ^{[1
]}

Duan, Jinming ^{[1
]}

Leonardis, Ales ^{[1
]}

机构：

[1] Univ Birmingham, Sch Comp Sci, Birmingham, W Midlands, England

[2] Natl Univ Def Technol, Sch Comp Sci, Changsha, Peoples R China

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年

基金：

英国工程与自然科学研究理事会;

关键词：

3D OBJECT DETECTION; CLASSIFICATION;

D O I：

10.1109/CVPR42600.2020.00429

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper; we propose a novel real-time 6D object pose estimation framework, named G2L-Net. Our network operates on point clouds from RGB-D detection in a divide-and-conquer fashion. Specifically, our network consists of three steps. First, we extract the coarse object point cloud from the RGB-D image by 2D detection. Second, we feed the coarse object point cloud to a translation localization network to perform 3D segmentation and object translation prediction. Third, via the predicted segmentation and translation, we transfer the fine object point cloud into a local canonical coordinate, in which we train a rotation localization network to estimate initial object rotation. In the third step, we define point-wise embedding vector features to capture viewpoint-aware information. To calculate more accurate rotation, we adopt a rotation residual estimator to estimate the residual between initial rotation and ground truth, which can boost initial pose estimation performance. Our proposed G2L-Net is real-time despite the fact multiple steps are stacked via the proposed coarse-to-fine framework. Extensive experiments on two benchmark datasets show that G2L-Net achieves state-of-the-art performance in terms of both accuracy and speed.(1)

引用

页码：4232 / 4241

页数：10

共 47 条

[1]

BESL PJ, 1992, P SOC PHOTO-OPT INS, V1611, P586, DOI 10.1117/12.57955

[2]

Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35

[3]

Calli Berk, 2015, ARXIV PREPRINT ARXIV

[4]

Chen W, 2016, INT C PATT RECOG, P1821, DOI 10.1109/ICPR.2016.7899901

[5]

Chen Wei, 2020, IEEE WINT C APPL COM

[6] OBJECT MODELING BY REGISTRATION OF MULTIPLE RANGE IMAGES [J].

CHEN, Y ;

MEDIONI, G .

IMAGE AND VISION COMPUTING, 1992, 10 (03) :145-155

[7] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].

Dai, Angela ;

Qi, Charles Ruizhongtai ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554

[8]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[9] A COMMON-PERIOD 4-SATELLITE CONTINUOUS GLOBAL COVERAGE CONSTELLATION [J].

DRAIM, JE .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1987, 10 (05) :492-499

[10] Complete solution classification for the Perspective-Three-Point problem [J].

Gao, XS ;

Hou, XR ;

Tang, JL ;

Cheng, HF .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (08) :930-943

← 1 2 3 4 5 →