Weak Multi-View Supervision for Surface Mapping Estimation

被引：0

作者：

Rai, Nishant ^{[1
,2
,3
]}

Liaudanskas, Aidas ^{[1
]}

Rao, Srinivas ^{[1
]}

Cayon, Rodrigo Ortiz ^{[1
]}

Munaro, Matteo ^{[1
]}

Holzer, Stefan ^{[1
]}

机构：

[1] Fyusion Inc, San Francisco, CA 94105 USA

[2] Stanford Univ, Stanford, CA 94305 USA

[3] Fyusion, San Francisco, CA USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 | 2021年

关键词：

SHAPE;

D O I：

10.1109/CVPRW53098.2021.00310

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a weakly-supervised multi-view learning approach to learn category-specific surface mapping without dense annotations. We learn the underlying surface geometry of common categories, such as human faces, cars, and airplanes, given instances from those categories. While traditional approaches solve this problem using extensive supervision in the form of pixel-level annotations, we take advantage of the fact that pixel-level UV and mesh predictions can be combined with 3D reprojections to form consistency cycles. As a result of exploiting these cycles, we can establish a dense correspondence mapping between image pixels and the mesh acting as a self-supervisory signal, which in turn helps improve our overall estimates. Our approach leverages information from multiple views of the object to establish additional consistency cycles, thus improving surface mapping understanding without the need for explicit annotations. We also propose the use of deformation fields for predictions of an instance specific mesh. Given the lack of datasets providing multiple images of similar object instances from different viewpoints, we generate and release a multi-view ShapeNet Cars and Airplanes dataset created by rendering ShapeNet meshes using a 360ffi camera trajectory around the mesh. For the human faces category, we process and adapt an existing dataset to a multi-view setup. Through experimental evaluations, we show that, at test time, our method can generate accurate variations away from the mean shape, is multi-view consistent, and performs comparably to fully supervised approaches.

引用

页码：2753 / 2762

页数：10

共 42 条

[1] The space of human body shapes: reconstruction and parameterization from range scans
Allen, B
Curless, B
Popovic, Z
[J]. ACM TRANSACTIONS ON GRAPHICS, 2003, 22 (03): : 587 - 594
[2] SCAPE: Shape Completion and Animation of People
Anguelov, D
Srinivasan, P
Koller, D
Thrun, S
Rodgers, J
Davis, J
[J]. ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (03): : 408 - 416
[3] [Anonymous], 2018, COMPUTER VISION PATT
[4] [Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01264-9_33
[5] [Anonymous], 2016, SIGGRAPH ASIA, DOI DOI 10.1145/2980179.2980238
[6] A morphable model for the synthesis of 3D faces
Blanz, V
Vetter, T
[J]. SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, : 187 - 194
[7] Chang A X, 2015, COMPUTER SCI, V1512, P3
[8] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[9] Chen W., 2019, NEURIPS
[10] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
Dai, Angela
Qi, Charles Ruizhongtai
Niessner, Matthias
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554

← 1 2 3 4 5 →