GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence

被引:0
作者
Wang, Pengyuan [1 ]
Ikeda, Takuya [2 ]
Lee, Robert [2 ]
Nishiwaki, Koichi [2 ]
机构
[1] Tech Univ Munich, Munich, Germany
[2] Woven Toyota, Tokyo, Japan
来源
COMPUTER VISION - ECCV 2024, PT XXVII | 2025年 / 15085卷
关键词
D O I
10.1007/978-3-031-73383-3_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to reduce the domain-gap but these approaches suffer from a lack of semantic information, which can be vital in the pose estimation problem. To resolve this conflict, we propose to utilize both geometric and semantic features obtained from a pre-trained foundation model. Our approach projects 2D semantic features into object models as 3D semantic point clouds. Based on the novel 3D representation, we further propose a self-supervision pipeline, and match the fused semantic point clouds against their synthetic rendered partial observations from synthetic object models. The learned knowledge from synthetic data generalizes to observations of unseen objects in the real scenes, without any fine-tuning. We demonstrate this with a rich evaluation on the NOCS, Wild6D and SUN RGB-D benchmarks, showing superior performance over geometric-only and semantic-only baselines with significantly fewer training objects.
引用
收藏
页码:108 / 126
页数:19
相关论文
共 66 条
  • [1] Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, DOI 10.48550/ARXIV.2005.14165]
  • [2] Chen DS, 2020, PROC CVPR IEEE, P11970, DOI 10.1109/CVPR42600.2020.01199
  • [3] SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation
    Chen, Kai
    Dou, Qi
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2753 - 2762
  • [4] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
    Chen, Wei
    Jia, Xi
    Chang, Hyung Jin
    Duan, Jinming
    Shen, Linlin
    Leonardis, Ales
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1581 - 1590
  • [5] PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors
    Deng, Haowen
    Birdal, Tolga
    Ilic, Slobodan
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 620 - 638
  • [6] PPFNet: Global Context Aware Local Features for Robust 3D Point Matching
    Deng, Haowen
    Birdal, Tolga
    Ilie, Slobodan
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 195 - 205
  • [7] GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting
    Di, Yan
    Zhang, Ruida
    Lou, Zhiqiang
    Manhardt, Fabian
    Ji, Xiangyang
    Navab, Nassir
    Tombari, Federico
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6771 - 6781
  • [8] Fan Z., 2023, arXiv, DOI arXiv:2306.07598
  • [9] Fan Z., 2021, arXiv
  • [10] RANDOM SAMPLE CONSENSUS - A PARADIGM FOR MODEL-FITTING WITH APPLICATIONS TO IMAGE-ANALYSIS AND AUTOMATED CARTOGRAPHY
    FISCHLER, MA
    BOLLES, RC
    [J]. COMMUNICATIONS OF THE ACM, 1981, 24 (06) : 381 - 395