GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

被引:263
作者
Wang, Gu [1 ,2 ]
Manhardt, Fabian [2 ]
Tombari, Federico [2 ,3 ]
Ji, Xiangyang [1 ]
机构
[1] Tsinghua Univ, BNRist, Beijing, Peoples R China
[2] Tech Univ Munich, Munich, Germany
[3] Google, Mountain View, CA 94043 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR46437.2021.01634
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
6D pose estimation from a single RGB image is a fundamental task in computer vision. The current top-performing deep learning-based methods rely on an indirect strategy, i.e., first establishing 2D-3D correspondences between the coordinates in the image plane and object coordinate system, and then applying a variant of the PnP/RANSAC algorithm. However, this two-stage pipeline is not end-toend trainable, thus is hard to be employed for many tasks requiring differentiable poses. On the other hand, methods based on direct regression are currently inferior to geometry-based methods. In this work, we perform an indepth investigation on both direct and indirect methods, and propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations. Extensive experiments show that our approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets. Code is available at https://git.io/GDR-Net.
引用
收藏
页码:16606 / 16616
页数:11
相关论文
共 66 条
[1]  
[Anonymous], 2019, ADV NEURAL INFORM PR, DOI DOI 10.1109/EMBC.2019.8856774
[2]   Physicochemical Properties of Hyaluronic Acid-Based Lubricant Eye Drops [J].
Aragona, Pasquale ;
Simmons, Peter A. S. ;
Wang, Hongpeng ;
Wang, Tao .
TRANSLATIONAL VISION SCIENCE & TECHNOLOGY, 2019, 8 (06)
[3]   Monocular Differentiable Rendering for Self-supervised 3D Object Detection [J].
Beker, Deniz ;
Kato, Hiroharu ;
Morariu, Mihai Adrian ;
Ando, Takahiro ;
Matsuoka, Toru ;
Kehl, Wadim ;
Gaidon, Adrien .
COMPUTER VISION - ECCV 2020, PT XXI, 2020, 12366 :514-529
[4]  
Bochkovskiy A., 2020, YOLOv4: Optimal Speed and Accuracy of Object Detection
[5]   Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses [J].
Brachmann, Eric ;
Rother, Carsten .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4321-4330
[6]   Learning Less is More-6D Camera Localization via 3D Surface Regression [J].
Brachmann, Eric ;
Rother, Carsten .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4654-4662
[7]   DSAC - Differentiable RANSAC for Camera Localization [J].
Brachmann, Eric ;
Krull, Alexander ;
Nowozin, Sebastian ;
Shotton, Jamie ;
Michel, Frank ;
Gumhold, Stefan ;
Rother, Carsten .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2492-2500
[8]   Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image [J].
Brachmann, Eric ;
Michel, Frank ;
Krull, Alexander ;
Yang, Michael Ying ;
Gumhold, Stefan ;
Rother, Carsten .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3364-3372
[9]  
Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35
[10]   End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization [J].
Chen, Bo ;
Parra, Alvaro ;
Cao, Jiewei ;
Li, Nan ;
Chin, Tat-Jun .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8097-8106