Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation

被引:123
作者
Chen, Dengsheng [1 ]
Li, Jun [1 ]
Wang, Zheng [3 ]
Xu, Kai [1 ,2 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
[2] SpeedBot Robot Ltd, Changsha, Peoples R China
[3] Taobao Com, Hangzhou, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.01199
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel approach to category-level 6D object pose and size estimation. To tackle intra-class shape variations, we learn canonical shape space (CASS), a unified representation for a large variety of instances of a certain object category. In particular, CASS is modeled as the latent space of a deep generative model of canonical 3D shapes with normalized pose. We train a variational auto-encoder (VAE) for generating 3D point clouds in the canonical space from an RGBD image. The VAE is trained in a cross-category fashion, exploiting the publicly available large 3D shape repositories. Since the 3D point cloud is generated in normalized pose (with actual size), the encoder of the VAE learns view-factorized RGBD embedding. It maps an RGBD image in arbitrary view into a pose-independent 3D shape representation. Object pose is then estimated via contrasting it with a pose-dependent feature of the input RGBD extracted with a separate deep neural networks. We integrate the learning of CASS and pose and size estimation into an end-to-end trainable network, achieving the state-of-the-art performance.
引用
收藏
页码:11970 / 11979
页数:10
相关论文
共 33 条
[1]   Scan2CAD: Learning CAD Model Alignment in RGB-D Scans [J].
Avetisyan, Armen ;
Dahnert, Manuel ;
Dai, Angela ;
Savva, Manolis ;
Chang, Angel X. ;
Niessner, Matthias .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2609-2618
[2]   HPatches: A benchmark and evaluation of handcrafted and learned local descriptors [J].
Balntas, Vassileios ;
Lenc, Karel ;
Vedaldi, Andrea ;
Mikolajczyk, Krystian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3852-3861
[3]  
Brachmann E, 2014, LECT NOTES COMPUT SC, V8690, P536, DOI 10.1007/978-3-319-10605-2_35
[4]  
Chang AX, 2015, Technical Report Tech Report
[5]  
Choi C, 2012, IEEE INT C INT ROBOT, P3342, DOI 10.1109/IROS.2012.6386067
[6]  
Choi C, 2012, IEEE INT CONF ROBOT, P1724, DOI 10.1109/ICRA.2012.6225371
[7]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[8]  
Do T.-T., 2018, ARXIV
[9]  
Georgakis Georgios, 2018, ARXIV
[10]   Learning a Predictable and Generative Vector Representation for Objects [J].
Girdhar, Rohit ;
Fouhey, David F. ;
Rodriguez, Mikel ;
Gupta, Abhinav .
COMPUTER VISION - ECCV 2016, PT VI, 2016, 9910 :484-499