Weak Multi-View Supervision for Surface Mapping Estimation

被引:0
作者
Rai, Nishant [1 ,2 ,3 ]
Liaudanskas, Aidas [1 ]
Rao, Srinivas [1 ]
Cayon, Rodrigo Ortiz [1 ]
Munaro, Matteo [1 ]
Holzer, Stefan [1 ]
机构
[1] Fyusion Inc, San Francisco, CA 94105 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Fyusion, San Francisco, CA USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 | 2021年
关键词
SHAPE;
D O I
10.1109/CVPRW53098.2021.00310
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a weakly-supervised multi-view learning approach to learn category-specific surface mapping without dense annotations. We learn the underlying surface geometry of common categories, such as human faces, cars, and airplanes, given instances from those categories. While traditional approaches solve this problem using extensive supervision in the form of pixel-level annotations, we take advantage of the fact that pixel-level UV and mesh predictions can be combined with 3D reprojections to form consistency cycles. As a result of exploiting these cycles, we can establish a dense correspondence mapping between image pixels and the mesh acting as a self-supervisory signal, which in turn helps improve our overall estimates. Our approach leverages information from multiple views of the object to establish additional consistency cycles, thus improving surface mapping understanding without the need for explicit annotations. We also propose the use of deformation fields for predictions of an instance specific mesh. Given the lack of datasets providing multiple images of similar object instances from different viewpoints, we generate and release a multi-view ShapeNet Cars and Airplanes dataset created by rendering ShapeNet meshes using a 360ffi camera trajectory around the mesh. For the human faces category, we process and adapt an existing dataset to a multi-view setup. Through experimental evaluations, we show that, at test time, our method can generate accurate variations away from the mean shape, is multi-view consistent, and performs comparably to fully supervised approaches.
引用
收藏
页码:2753 / 2762
页数:10
相关论文
共 42 条
  • [1] The space of human body shapes: reconstruction and parameterization from range scans
    Allen, B
    Curless, B
    Popovic, Z
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2003, 22 (03): : 587 - 594
  • [2] SCAPE: Shape Completion and Animation of People
    Anguelov, D
    Srinivasan, P
    Koller, D
    Thrun, S
    Rodgers, J
    Davis, J
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (03): : 408 - 416
  • [3] [Anonymous], 2018, COMPUTER VISION PATT
  • [4] [Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01264-9_33
  • [5] [Anonymous], 2016, SIGGRAPH ASIA, DOI DOI 10.1145/2980179.2980238
  • [6] A morphable model for the synthesis of 3D faces
    Blanz, V
    Vetter, T
    [J]. SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, : 187 - 194
  • [7] Chang A X, 2015, COMPUTER SCI, V1512, P3
  • [8] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [9] Chen W., 2019, NEURIPS
  • [10] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
    Dai, Angela
    Qi, Charles Ruizhongtai
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554