PoseDiffusion: A Coarse-to-Fine Framework for Unseen Object 6-DoF Pose Estimation

被引:2
作者
Zhou, Jiaming [1 ,2 ]
Zhu, Qing [1 ,2 ]
Wang, Yaonan [1 ,2 ]
Feng, Mingtao [3 ]
Wu, Chengzhong [4 ]
Liu, Xuebing [1 ,2 ]
Huang, Jianan [1 ,2 ]
Mian, Ajmal [5 ]
机构
[1] Hunan Univ, Coll Elect & Informat Engn, Changsha 410012, Peoples R China
[2] Natl Engn Res Ctr Robot Visual Percept & Control, Changsha 410082, Peoples R China
[3] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China
[4] Jiangxi Prov Commun Terminal Ind Co Ltd, Jian 343000, Peoples R China
[5] Univ Western Australia, Dept Comp Sci & Software Engn, Perth, WA 6009, Australia
关键词
Diffusion model; robotic grasping; transformer; unseen object pose estimation;
D O I
10.1109/TII.2024.3399886
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accurately estimating the six-degrees of freedom (DoF) pose of unseen objects is crucial for successful robotic manipulation in industrial automation. Some existing methods for this task rely on prior knowledge of individual objects, i.e., the model must be trained on the exact object instance or object category. Others perform unseen object pose estimation but are limited in their feature learning and pose refinement ability. To address these problems, we propose an unseen object pose estimation method that follows a coarse-to-fine framework and leverages the powerful learning ability of diffusion models. We introduce a diffusion model for generating object poses, and conduct a comparison between the generated poses and the original pose to determine the optimal one. We design a novel pose estimation module to provide coarse poses for the PoseDiffusion. This module comprises two feature extraction modules that extract global and masked features. In addition, we propose a strategy to estimate the pose by comparing the similarity between rendered and query poses. The renderings of an unseen object from various viewpoints are generated from its computer-aided design (CAD) model. Our method requires a CAD model of the unseen object only during inference, a scenario well suited to industrial applications. Experimental evaluation on benchmark datasets demonstrates that the proposed framework outperforms existing approaches, achieving state-of-the-art performance in six-DoF object pose estimation.
引用
收藏
页码:11127 / 11138
页数:12
相关论文
共 31 条
  • [1] Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35
  • [2] OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation
    Cai, Dingding
    Heikkia, Janne
    Rahtu, Esa
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6793 - 6803
  • [3] Chen DS, 2020, PROC CVPR IEEE, P11970, DOI 10.1109/CVPR42600.2020.01199
  • [4] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
    Chen, Wei
    Jia, Xi
    Chang, Hyung Jin
    Duan, Jinming
    Shen, Linlin
    Leonardis, Ales
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1581 - 1590
  • [5] Learning a similarity metric discriminatively, with application to face verification
    Chopra, S
    Hadsell, R
    LeCun, Y
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
  • [6] Feng M., 2023, IEEE Trans. Multimedia, DOI [10.1109/TMM.2023.3277736, DOI 10.1109/TMM.2023.3277736]
  • [7] DensePose: Dense Human Pose Estimation In The Wild
    Guler, Riza Alp
    Neverova, Natalia
    Kokkinos, Lasonas
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7297 - 7306
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] FS6D: Few-Shot 6D Pose Estimation of Novel Objects
    He, Yisheng
    Wang, Yao
    Fan, Haoqiang
    Sun, Jian
    Chen, Qifeng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6804 - 6814
  • [10] Gradient Response Maps for Real-Time Detection of Textureless Objects
    Hinterstoisser, Stefan
    Cagniart, Cedric
    Ilic, Slobodan
    Sturm, Peter
    Navab, Nassir
    Fua, Pascal
    Lepetit, Vincent
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 876 - 888