Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image

被引:18
作者
Fan, Zhaoxin [1 ]
Song, Zhenbo [2 ]
Xu, Jian [4 ]
Wang, Zhicheng [4 ]
Wu, Kejian [4 ]
Liu, Hongyan [3 ]
He, Jun [1 ]
机构
[1] Renmin Univ China, Sch Informat, Key Lab Data Engn & Knowledge Engn MOE, Beijing 100872, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[3] Tsinghua Univ, Dept Management Sci & Engn, Beijing 100084, Peoples R China
[4] Nreal, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2022, PT II | 2022年 / 13662卷
基金
中国国家自然科学基金;
关键词
Category-level 6D pose estimation; Object-level depth; Position hints; Decoupled depth reconstruction;
D O I
10.1007/978-3-031-20086-1_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, RGBD-based category-level 6D object pose estimation has achieved promising improvement in performance, however, the requirement of depth information prohibits broader applications. In order to relieve this problem, this paper proposes a novel approach named Object Level Depth reconstruction Network (OLD-Net) taking only RGB images as input for category-level 6D object pose estimation. We propose to directly predict object-level depth from a monocular RGB image by deforming the category-level shape prior into object-level depth and the canonical NOCS representation. Two novel modules named Normalized Global Position Hints (NGPH) and Shape-aware Decoupled Depth Reconstruction (SDDR) module are introduced to learn high fidelity object-level depth and delicate shape representations. At last, the 6D object pose is solved by aligning the predicted canonical representation with the back-projected object-level depth. Extensive experiments on the challenging CAMERA25 and REAL275 datasets indicate that our model, though simple, achieves state-of-the-art performance.
引用
收藏
页码:220 / 236
页数:17
相关论文
共 36 条
[1]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[2]   Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation [J].
Chen, Dengsheng ;
Li, Jun ;
Wang, Zheng ;
Xu, Kai .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11970-11979
[3]   FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [J].
Chen, Wei ;
Jia, Xi ;
Chang, Hyung Jin ;
Duan, Jinming ;
Shen, Linlin ;
Leonardis, Ales .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1581-1590
[4]  
Du GG, 2020, Arxiv, DOI arXiv:1905.06658
[5]  
Fan Z., 2021, arXiv
[6]  
Fan ZX, 2022, Arxiv, DOI arXiv:2105.14291
[7]   Mesh R-CNN [J].
Gkioxari, Georgia ;
Malik, Jitendra ;
Johnson, Justin .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9784-9794
[8]   A survey of deep learning techniques for autonomous driving [J].
Grigorescu, Sorin ;
Trasnea, Bogdan ;
Cocias, Tiberiu ;
Macesanu, Gigel .
JOURNAL OF FIELD ROBOTICS, 2020, 37 (03) :362-386
[9]  
He Yisheng, 2020, P IEEE CVF C COMP VI, P11632
[10]  
Tan DJ, 2017, Arxiv, DOI arXiv:1709.01459