共 35 条
Certifiable Object Pose Estimation: Foundations, Learning Models, and Self-Training
被引:3
作者:
Talak, Rajat
[1
]
Peng, Lisa R.
[1
,2
]
Carlone, Luca
[1
]
机构:
[1] MIT, Lab Informat & Decis Syst, Cambridge, MA 02139 USA
[2] Ample, San Francisco, CA 94107 USA
基金:
美国国家科学基金会;
关键词:
Certifiable models;
computer vision;
3D robot vision;
object pose estimation;
safe perception;
self-supervised learning;
PREDICTION;
D O I:
10.1109/TRO.2023.3271568
中图分类号:
TP24 [机器人技术];
学科分类号:
080202 ;
1405 ;
摘要:
In this article, we consider a certifiable object pose estimation problem, where-given a partial point cloud of an object-the goal is to not only estimate the object pose, but also provide a certificate of correctness for the resulting estimate. Our first contribution is a general theory of certification for end-to-end perception models. In particular, we introduce the notion of ?-correctness, which bounds the distance between an estimate and the ground truth. We then show that ?-correctness can be assessed by implementing two certificates: 1) a certificate of observable correctness, which asserts if the model output is consistent with the input data and prior information; and 2) a certificate of nondegeneracy, which asserts whether the input data are sufficient to compute a unique estimate. Our second contribution is to apply this theory and design a new learning-based certifiable pose estimator. In particular, we propose C-3PO, a semantic-keypoint-based pose estimation model, augmented with the two certificates, to solve the certifiable pose estimation problem. C-3PO also includes a keypoint corrector, implemented as a differentiable optimization layer, that can correct large detection errors (e.g., due to the sim-to-real gap). Our third contribution is a novel self-supervised training approach that uses our certificate of observable correctness to provide the supervisory signal to C-3PO during training. In it, the model trains only on the observably correct input-output pairs produced in each batch and at each iteration. As training progresses, we see that the observably correct input-output pairs grow, eventually reaching near 100% in many cases. We conduct extensive experiments to evaluate the performance of the corrector, the certification, and the proposed self-supervised training using the ShapeNet and YCB datasets. The experiments show that 1) standard semantic-keypoint-based methods (which constitute the backbone of C-3PO) outperform more recent alternatives in challenging problem instances; 2) C-3PO further improves performance and significantly outperforms all the baselines; and 3) C-3PO's certificates are able to discern correct pose estimates.(1)
引用
收藏
页码:2805 / 2824
页数:20
相关论文