FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

被引：109

作者：

Chen, Wei ^{[1
]}

Jia, Xi ^{[1
]}

Chang, Hyung Jin ^{[1
]}

Duan, Jinming ^{[1
]}

Shen, Linlin ^{[2
]}

Leonardis, Ales ^{[1
]}

机构：

[1] Univ Birmingham, Sch Comp Sci, Birmingham, W Midlands, England

[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Comp Vis Inst, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

10.1109/CVPR46437.2021.00163

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware autoencoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scale-invariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6:3% on the NOCS-REAL dataset(1).

引用

页码：1581 / 1590

页数：10

共 46 条

[11] RANDOM SAMPLE CONSENSUS - A PARADIGM FOR MODEL-FITTING WITH APPLICATIONS TO IMAGE-ANALYSIS AND AUTOMATED CARTOGRAPHY
FISCHLER, MA
BOLLES, RC
[J]. COMMUNICATIONS OF THE ACM, 1981, 24 (06) : 381 - 395
[12] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[13] He Yisheng, 2020, P IEEE CVF C COMP VI, P11632, DOI 10.1109/CVPR42600.2020.01165
[14] Hinterstoisser S., 2013, P AS C COMP VIS, P548
[15] Gradient Response Maps for Real-Time Detection of Textureless Objects
Hinterstoisser, Stefan
Cagniart, Cedric
Ilic, Slobodan
Sturm, Peter
Navab, Nassir
Fua, Pascal
Lepetit, Vincent
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 876 - 888
[16] SOLUTION FOR BEST ROTATION TO RELATE 2 SETS OF VECTORS
KABSCH, W
[J]. ACTA CRYSTALLOGRAPHICA SECTION A, 1976, 32 (SEP1): : 922 - 923
[17] Kingma D.P., 2015, Adam: A method for stochastic optimization
[18] A Unified Framework for Multi-view Multi-class Object Pose Estimation
Li, Chi
Bai, Jin
Hager, Gregory D.
[J]. COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 263 - 281
[19] CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation
Li, Zhigang
Wang, Gu
Ji, Xiangyang
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7677 - 7686
[20] Microsoft COCO: Common Objects in Context
Lin, Tsung-Yi
Maire, Michael
Belongie, Serge
Hays, James
Perona, Pietro
Ramanan, Deva
Dollar, Piotr
Zitnick, C. Lawrence
[J]. COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 740 - 755

← 1 2 3 4 5 →