GenPose: Generative Category-level Object Pose Estimation via Diffusion Models

被引:0
作者
Zhang, Jiyao [1 ,2 ,3 ]
Wu, Mingdong [1 ,3 ]
Dong, Hao [1 ,3 ]
机构
[1] Peking Univ, Sch Comp Sci, Ctr Frontiers Comp Studies, Beijing, Peoples R China
[2] Beijing Acad Artificial Intelligence, Beijing, Peoples R China
[3] Peking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object pose estimation plays a vital role in embodied AI and computer vision, enabling intelligent agents to comprehend and interact with their surroundings. Despite the practicality of category-level pose estimation, current approaches encounter challenges with partially observed point clouds, known as the multi-hypothesis issue. In this study, we propose a novel solution by reframing category-level object pose estimation as conditional generative modeling, departing from traditional point-to-point regression. Leveraging score-based diffusion models, we estimate object poses by sampling candidates from the diffusion model and aggregating them through a two-step process: filtering out outliers via likelihood estimation and subsequently mean-pooling the remaining candidates. To avoid the costly integration process when estimating the likelihood, we introduce an alternative method that trains an energy-based model from the original score-based model, enabling end-to-end likelihood estimation. Our approach achieves state-of-the-art performance on the REAL275 dataset and demonstrates promising generalizability to novel categories sharing similar symmetric properties without fine-tuning. Furthermore, it can readily adapt to object pose tracking tasks, yielding comparable results to the current state-of-the-art baselines. Our checkpoints and demonstrations can be found at https://sites.google.com/view/genpose.
引用
收藏
页数:18
相关论文
共 53 条
[1]  
[Anonymous], 2020, Uncertainty in Artificial Intelligence
[2]   ATTITUDE DETERMINATION FROM VECTOR OBSERVATIONS - QUATERNION ESTIMATION [J].
BARITZHACK, IY ;
OSHMAN, Y .
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 1985, 21 (01) :128-136
[3]   Learning Gradient Fields for Shape Generation [J].
Cai, Ruojin ;
Yang, Guandao ;
Averbuch-Elor, Hadar ;
Hao, Zekun ;
Belongie, Serge ;
Snavely, Noah ;
Hariharan, Bharath .
COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :364-381
[4]   Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation [J].
Chen, Dengsheng ;
Li, Jun ;
Wang, Zheng ;
Xu, Kai .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11970-11979
[5]  
Chen K., 2021, P IEEECVF INT C COMP, P2773
[6]  
Chen Wang, 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA), P10059, DOI 10.1109/ICRA40945.2020.9196679
[7]   FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [J].
Chen, Wei ;
Jia, Xi ;
Chang, Hyung Jin ;
Duan, Jinming ;
Shen, Linlin ;
Leonardis, Ales .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1581-1590
[8]  
Ci Hai, 2022, ARXIV221208641
[9]  
Ci Hai, 2023, ARXIV230303767
[10]   Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects [J].
Dai, Qiyu ;
Zhang, Jiyao ;
Li, Qiwei ;
Wu, Tianhao ;
Dong, Hao ;
Liu, Ziyuan ;
Tan, Ping ;
Wang, He .
COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 :374-391