GenPose: Generative Category-level Object Pose Estimation via Diffusion Models

被引：0

作者：

Zhang, Jiyao ^{[1
,2
,3
]}

Wu, Mingdong ^{[1
,3
]}

Dong, Hao ^{[1
,3
]}

机构：

[1] Peking Univ, Sch Comp Sci, Ctr Frontiers Comp Studies, Beijing, Peoples R China

[2] Beijing Acad Artificial Intelligence, Beijing, Peoples R China

[3] Peking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object pose estimation plays a vital role in embodied AI and computer vision, enabling intelligent agents to comprehend and interact with their surroundings. Despite the practicality of category-level pose estimation, current approaches encounter challenges with partially observed point clouds, known as the multi-hypothesis issue. In this study, we propose a novel solution by reframing category-level object pose estimation as conditional generative modeling, departing from traditional point-to-point regression. Leveraging score-based diffusion models, we estimate object poses by sampling candidates from the diffusion model and aggregating them through a two-step process: filtering out outliers via likelihood estimation and subsequently mean-pooling the remaining candidates. To avoid the costly integration process when estimating the likelihood, we introduce an alternative method that trains an energy-based model from the original score-based model, enabling end-to-end likelihood estimation. Our approach achieves state-of-the-art performance on the REAL275 dataset and demonstrates promising generalizability to novel categories sharing similar symmetric properties without fine-tuning. Furthermore, it can readily adapt to object pose tracking tasks, yielding comparable results to the current state-of-the-art baselines. Our checkpoints and demonstrations can be found at https://sites.google.com/view/genpose.

引用

页数：18

共 53 条

[1]

[Anonymous], 2020, Uncertainty in Artificial Intelligence

[2] ATTITUDE DETERMINATION FROM VECTOR OBSERVATIONS - QUATERNION ESTIMATION [J].

BARITZHACK, IY ;

OSHMAN, Y .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 1985, 21 (01) :128-136

[3] Learning Gradient Fields for Shape Generation [J].

Cai, Ruojin ;

Yang, Guandao ;

Averbuch-Elor, Hadar ;

Hao, Zekun ;

Belongie, Serge ;

Snavely, Noah ;

Hariharan, Bharath .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :364-381

[4] Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation [J].

Chen, Dengsheng ;

Li, Jun ;

Wang, Zheng ;

Xu, Kai .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11970-11979

[5]

Chen K., 2021, P IEEECVF INT C COMP, P2773

[6]

Chen Wang, 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA), P10059, DOI 10.1109/ICRA40945.2020.9196679

[7] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [J].

Chen, Wei ;

Jia, Xi ;

Chang, Hyung Jin ;

Duan, Jinming ;

Shen, Linlin ;

Leonardis, Ales .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1581-1590

[8]

Ci Hai, 2022, ARXIV221208641

[9]

Ci Hai, 2023, ARXIV230303767

[10] Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects [J].

Dai, Qiyu ;

Zhang, Jiyao ;

Li, Qiwei ;

Wu, Tianhao ;

Dong, Hao ;

Liu, Ziyuan ;

Tan, Ping ;

Wang, He .

COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 :374-391

← 1 2 3 4 5 6 →