FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

被引:109
作者
Chen, Wei [1 ]
Jia, Xi [1 ]
Chang, Hyung Jin [1 ]
Duan, Jinming [1 ]
Shen, Linlin [2 ]
Leonardis, Ales [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham, W Midlands, England
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Comp Vis Inst, Shenzhen, Peoples R China
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR46437.2021.00163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware autoencoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scale-invariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6:3% on the NOCS-REAL dataset(1).
引用
收藏
页码:1581 / 1590
页数:10
相关论文
共 46 条
  • [21] Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis
    Lin, Zhi-Hao
    Huang, Sheng-Yu
    Wang, Yu-Chiang Frank
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1797 - 1806
  • [22] Pose Estimation for Augmented Reality: A Hands-On Survey
    Marchand, Eric
    Uchiyama, Hideaki
    Spindler, Fabien
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2016, 22 (12) : 2633 - 2651
  • [23] Marder-Eppstein E., 2016, ACM SIGGRAPH 2016 RE, P25, DOI DOI 10.1145/2933540.2933550
  • [24] Meng Tian, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12366), P530, DOI 10.1007/978-3-030-58589-1_32
  • [25] Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation
    Oberweger, Markus
    Rad, Mahdi
    Lepetit, Vincent
    [J]. COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 125 - 141
  • [26] Paszke A., 2017, Pytorch
  • [27] PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation
    Peng, Sida
    Liu, Yuan
    Huang, Qixing
    Zhou, Xiaowei
    Bao, Hujun
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4556 - 4565
  • [28] On Object Symmetries and 6D Pose Estimation from Images
    Pitteri, Giorgia
    Ramamonjisoa, Michael
    Ilic, Slobodan
    Lepetit, Vincent
    [J]. 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 614 - 622
  • [29] Qi C.R, 2017, P ADV NEUR INF PROC, P5105
  • [30] Frustum PointNets for 3D Object Detection from RGB-D Data
    Qi, Charles R.
    Liu, Wei
    Wu, Chenxia
    Su, Hao
    Guibas, Leonidas J.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 918 - 927