Zero-Shot Category-Level Object Pose Estimation

被引:28
作者
Goodwin, Walter [1 ]
Vaze, Sagar [2 ]
Havoutis, Ioannis [1 ]
Posner, Ingmar [1 ]
机构
[1] Univ Oxford, Oxford Robot Inst, Oxford, England
[2] Univ Oxford, Visual Geometry Grp, Oxford, England
来源
COMPUTER VISION, ECCV 2022, PT XXXIX | 2022年 / 13699卷
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1007/978-3-031-19842-7_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference. Specifically, we make the following contributions. First, we formalise the zero-shot, category-level pose estimation problem and frame it in a way that is most applicable to real-world embodied agents. Secondly, we propose a novel method based on semantic correspondences from a self-supervised vision transformer to solve the pose estimation problem. We further re-purpose the recent CO3D dataset to present a controlled and realistic test setting. Finally, we demonstrate that all baselines for our proposed task perform poorly, and show that our method provides a six-fold improvement in average rotation accuracy at 30 C-o. Our code is available at https:// github.com/applied- ai- lab/zero- shot-pose.
引用
收藏
页码:516 / 532
页数:17
相关论文
共 45 条
[1]   Neural Best-Buddies: Sparse Cross-Domain Correspondence [J].
Aberman, Kfir ;
Liao, Jing ;
Shi, Mingyi ;
Lischinski, Dani ;
Chen, Baoquan ;
Cohen-Or, Daniel .
ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04)
[2]   Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations [J].
Ahmadyan, Adel ;
Zhang, Liangkai ;
Ablavatski, Artsiom ;
Wei, Jianing ;
Grundmann, Matthias .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7818-7827
[3]  
Akizuki S., 2021, BMVC
[4]  
Amir S., 2021, Deep vit features as dense visual descriptors
[5]  
Caron M, 2020, ADV NEUR IN, V33
[6]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[7]   Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation [J].
Chen, Dengsheng ;
Li, Jun ;
Wang, Zheng ;
Xu, Kai .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11970-11979
[8]   SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation [J].
Chen, Kai ;
Dou, Qi .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2753-2762
[9]   FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [J].
Chen, Wei ;
Jia, Xi ;
Chang, Hyung Jin ;
Duan, Jinming ;
Shen, Linlin ;
Leonardis, Ales .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1581-1590
[10]  
Chen XL, 2020, Arxiv, DOI arXiv:2003.04297