GRPoseNet: a generalizable and robust 6D object pose estimation network using sparse RGB views

被引:0
作者
Shi, Wubin [1 ,2 ]
Gai, Shaoyan [1 ,2 ]
Da, Feipeng [1 ,2 ]
Cai, Zeyu [1 ,2 ]
Wang, Jiaoling [3 ,4 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing, Peoples R China
[2] Key Lab Measurement & Control Complex Syst Engn, Nanjing, Peoples R China
[3] Zhejiang Univ, Coll Biosyst Engn & Food Sci, Zhejiang Prov Key Lab Agr Intelligent Equipment &, Hangzhou, Peoples R China
[4] Minist Agr & Rural Affairs, Nanjing Inst Agr Mechanizat, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
Object pose estimation; Deep learning; Generalizable; Zero shot; MODEL; SHOT;
D O I
10.1007/s00371-025-03852-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Six-degree-of-freedom object pose estimation plays a crucial role in various computer vision and robotics tasks. Existing methods often rely heavily on CAD models and substantial prior information, limiting their generalization to unseen objects in open scenes. To address this limitation, we propose GRPoseNet, a generalizable and robust 6D object pose estimation network that can predict the pose of unseen objects using only sparse RGB images with reference poses. GRPoseNet comprises an open-world detector, a viewpoint selector, and an adaptive multi-scale refiner. The open-world detector leverages pre-trained large models for zero-shot segmentation and feature extraction, overcoming detection and matching errors with unseen objects. The viewpoint selector uses our designed similarity network to select the most similar reference view for initial pose estimation. The adaptive multi-scale refiner further refines the pose by iteratively updating rotation and translation residuals based on multi-scale features and adaptive weights. Extensive experiments on benchmark datasets and our robust test dataset, RBMOP, demonstrate that GRPoseNet achieves state-of-the-art performance, showing excellent generalization and robustness to unseen objects and sparse views. The codes and datasets are available at: https://github.com/KierSaS/GRPoseNet.
引用
收藏
页码:8009 / 8023
页数:15
相关论文
共 67 条
[1]   A systematic review: Virtual-reality-based techniques for human exercises and health improvement [J].
Ali, Saba Ghazanfar ;
Wang, Xiangning ;
Li, Ping ;
Jung, Younhyun ;
Bi, Lei ;
Kim, Jinman ;
Chen, Yuting ;
Feng, David Dagan ;
Magnenat Thalmann, Nadia ;
Wang, Jihong ;
Sheng, Bin .
FRONTIERS IN PUBLIC HEALTH, 2023, 11
[2]  
Amir Shir., 2021, Deep vit features as dense visual descriptors, V2, P4
[3]   POSE ESTIMATION THROUGH MASK-R CNN AND vSLAM IN LARGE-SCALE OUTDOORS AUGMENTED REALITY [J].
Boutsi, A-M ;
Bakalos, N. ;
Ioannidis, C. .
XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION IV, 2022, 5-4 :197-204
[4]  
Cai D., 2024, ARXIV
[5]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[6]   EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation [J].
Chen, Hansheng ;
Wang, Pichao ;
Wang, Fan ;
Tian, Wei ;
Xiong, Lu ;
Li, Hao .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :2771-2780
[7]   StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS [J].
Chen, Kai ;
James, Stephen ;
Sui, Congying ;
Liu, Yun-Hui ;
Abbeel, Pieter ;
Dou, Qi .
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, :2855-2861
[8]  
Chen Wang, 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA), P10059, DOI 10.1109/ICRA40945.2020.9196679
[9]   FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [J].
Chen, Wei ;
Jia, Xi ;
Chang, Hyung Jin ;
Duan, Jinming ;
Shen, Linlin ;
Leonardis, Ales .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1581-1590
[10]  
Chen W, 2020, IEEE WINT CONF APPL, P2813, DOI [10.1109/wacv45572.2020.9093272, 10.1109/WACV45572.2020.9093272]