Category Level Object Pose Estimation via Neural Analysis-by-Synthesis

被引：69

作者：

Chen, Xu ^{[1
,3
]}

Dong, Zijian ^{[1
]}

Song, Jie ^{[1
]}

Geiger, Andreas ^{[2
,4
]}

Hilliges, Otmar ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Univ Tubingen, Tubingen, Germany

[3] Max Planck ETH Ctr Learning Syst, Tubingen, Germany

[4] Max Planck Inst Intelligent Syst, Tubingen, Germany

来源：

COMPUTER VISION - ECCV 2020, PT XXVI | 2020年 / 12371卷

关键词：

Category-level object pose; 6DoF pose estimation;

D O I：

10.1007/978-3-030-58574-7_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many object pose estimation algorithms rely on the analysis-by-synthesis framework which requires explicit representations of individual object instances. In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module that is capable of implicitly representing the appearance, shape and pose of entire object categories, thus rendering the need for explicit CAD models per object instance unnecessary. The image synthesis network is designed to efficiently span the pose configuration space so that model capacity can be used to capture the shape and local appearance (i.e., texture) variations jointly. At inference time the synthesized images are compared to the target via an appearance based loss and the error signal is back-propagated through the network to the input parameters. Keeping the network parameters fixed, this allows for iterative optimization of the object pose, shape and appearance in a joint manner and we experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone. When provided with depth measurements, to overcome scale ambiguities, the method can accurately recover the full 6DOF pose successfully.

引用

页码：139 / 156

页数：18

共 58 条

[1] Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? [J].

Abdal, Rameen ;

Qin, Yipeng ;

Wonka, Peter .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4431-4440

[2]

Achlioptas P, 2018, PR MACH LEARN RES, V80

[3] Semantic Photo Manipulation with a Generative Image Prior [J].

Bau, David ;

Strobelt, Hendrik ;

Peebles, William ;

Wulff, Jonas ;

Zhou, Bolei ;

Zhu, Jun-Yan ;

Torralba, Antonio .

ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (04)

[4]

Besl Paul J, 1992, Sensor Fusion IV: Control Paradigms and Data Structures, P586, DOI [10.1117/12.57955, DOI 10.1117/12.57955]

[5] CodeSLAM-Learning a Compact, Optimisable Representation for Dense Visual SLAM [J].

Bloesch, Michael ;

Czarnowski, Jan ;

Clark, Ronald ;

Leutenegger, Stefan ;

Davison, Andrew J. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2560-2568

[6]

Brock Andrew, 2016, arXiv

[7]

Chang Angel X., 2015, arXiv

[8]

Chen X, 2016, 30 C NEURAL INFORM P, V29

[9] TensorMask: A Foundation for Dense Object Segmentation [J].

Chen, Xinlei ;

Girshick, Ross ;

He, Kaiming ;

Dollar, Piotr .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2061-2069

[10] Learning Implicit Fields for Generative Shape Modeling [J].

Chen, Zhiqin ;

Zhang, Hao .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5932-5941

← 1 2 3 4 5 6 →