SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

被引:86
作者
Di, Yan [1 ]
Manhardt, Fabian [2 ]
Wang, Gu [3 ]
Ji, Xiangyang [3 ]
Navab, Nassir [1 ]
Tombari, Federico [1 ,2 ]
机构
[1] Tech Univ Munich, Munich, Germany
[2] Google, Mountain View, CA 94043 USA
[3] Tsinghua Univ, Beijing, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.01217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (i.e. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem. While end-to-end methods have recently demonstrated promising results at high efficiency, they are still inferior when compared with elaborate PnP/RANSAC-based approaches in terms of pose accuracy. In this work, we address this shortcoming by means of a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects which considerably enhances the accuracy of end-to-end 6D pose estimation. Our framework, named SO-Pose, takes a single RGB image as input and respectively generates 2D-3D correspondences as well as self-occlusion information harnessing a shared encoder and two separate decoders. Both outputs are then fused to directly regress the 6DoF pose parameters. Incorporating cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness, surpassing or rivaling all other state-of-the-art approaches on various challenging datasets.
引用
收藏
页码:12376 / 12385
页数:10
相关论文
共 49 条
[31]   BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth [J].
Rad, Mahdi ;
Lepetit, Vincent .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3848-3856
[32]  
Redmon J, 2018, Arxiv, DOI arXiv:1804.02767
[33]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[34]   3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers [J].
Shin, Daeyun ;
Ren, Zhile ;
Sudderth, Erik B. ;
Fowlkes, Charless C. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2172-2182
[35]   Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images [J].
Shotton, Jamie ;
Glocker, Ben ;
Zach, Christopher ;
Izadi, Shahram ;
Criminisi, Antonio ;
Fitzgibbon, Andrew .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2930-2937
[36]   HybridPose: 6D Object Pose Estimation under Hybrid Representations [J].
Song, Chen ;
Song, Jiaru ;
Huang, Qixing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :428-437
[37]   Implicit 3D Orientation Learning for 6D Object Detection from RGB Images [J].
Sundermeyer, Martin ;
Marton, Zoltan-Csaba ;
Durner, Maximilian ;
Brucker, Manuel ;
Triebel, Rudolph .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :712-729
[38]  
Sundermeyer Martin, 2020, CVPR, P13916
[39]   Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras [J].
Tan, David Joseph ;
Tombari, Federico ;
Navab, Nassir .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (2-4) :158-183
[40]   Real-Time Seamless Single Shot 6D Object Pose Prediction [J].
Tekin, Bugra ;
Sinha, Sudipta N. ;
Fua, Pascal .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :292-301