Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation

被引:330
作者
Park, Kiru [1 ]
Patten, Timothy [1 ]
Vincze, Markus [1 ]
机构
[1] TU Wien, Vis Robot Lab, Automat & Control Inst, Vienna, Austria
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
DEEP NETWORK;
D O I
10.1109/ICCV.2019.00776
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Estimating the 6D pose of objects using only RGB images remains challenging because of problems such as occlusion and symmetries. It is also difficult to construct 3D models with precise texture without expert knowledge or specialized scanning devices. To address these problems, we propose a novel pose estimation method, Pix2Pose, that predicts the 3D coordinates of each object pixel without textured models. An auto-encoder architecture is designed to estimate the 3D coordinates and expected errors per pixel. These pixel-wise predictions are then used in multiple stages to form 2D-3D correspondences to directly compute poses with the PnP algorithm with RANSAC iterations. Our method is robust to occlusion by leveraging recent achievements in generative adversarial training to precisely recover occluded parts. Furthermore, a novel loss function, the transformer loss, is proposed to handle symmetric objects by guiding predictions to the closest symmetric pose. Evaluations on three different benchmark datasets containing symmetric and occluded objects show our method outperforms the state of the art using only RGB images.
引用
收藏
页码:7667 / 7676
页数:10
相关论文
共 35 条
[21]  
Navab N., 2016, LECT NOTES COMPUT SC, V6038, DOI DOI 10.1007/978-3-319-46487-9_13
[22]  
Oberweger Markus, 2018, EUR C COMP VIS ECCV, V1, P2
[23]   Context Encoders: Feature Learning by Inpainting [J].
Pathak, Deepak ;
Krahenbuhl, Philipp ;
Donahue, Jeff ;
Darrell, Trevor ;
Efros, Alexei A. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2536-2544
[24]   Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images [J].
Rad, Mahdi ;
Oberweger, Markus ;
Lepetit, Vincent .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4663-4672
[25]   BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth [J].
Rad, Mahdi ;
Lepetit, Vincent .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3848-3856
[26]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[27]   U-Net: Convolutional Networks for Biomedical Image Segmentation [J].
Ronneberger, Olaf ;
Fischer, Philipp ;
Brox, Thomas .
MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :234-241
[28]   Implicit 3D Orientation Learning for 6D Object Detection from RGB Images [J].
Sundermeyer, Martin ;
Marton, Zoltan-Csaba ;
Durner, Maximilian ;
Brucker, Manuel ;
Triebel, Rudolph .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :712-729
[29]   Real-Time Seamless Single Shot 6D Object Pose Prediction [J].
Tekin, Bugra ;
Sinha, Sudipta N. ;
Fua, Pascal .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :292-301
[30]   Focal Loss for Dense Object Detection [J].
Lin, Tsung-Yi ;
Goyal, Priya ;
Girshick, Ross ;
He, Kaiming ;
Dollar, Piotr .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2999-3007