Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

被引:120
作者
Reizenstein, Jeremy [1 ]
Shapovalov, Roman [1 ]
Henzler, Philipp [2 ]
Sbordone, Luca [1 ]
Labatut, Patrick [1 ]
Novotny, David [1 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
[2] UCL, London, England
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.01072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional approaches for learning 3D object categories have been predominantly trained and evaluated on synthetic datasets due to the unavailability of real 3D-annotated category-centric data. Our main goal is to facilitate advances in this field by collecting real-world data in a magnitude similar to the existing synthetic counterparts. The principal contribution of this work is thus a large-scale dataset, called Common Objects in 3D, with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds. The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories and, as such, it is significantly larger than alternatives both in terms of the number of categories and objects. We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several new-view-synthesis and category-centric 3D reconstruction methods. Finally, we contribute NerFormer - a novel neural rendering method that leverages the powerful Transformer to reconstruct an object given a small number of its views.
引用
收藏
页码:10881 / 10891
页数:11
相关论文
共 75 条
  • [51] Structure-from-Motion Revisited
    Schonberger, Johannes L.
    Frahm, Jan -Michael
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4104 - 4113
  • [52] Pixelwise View Selection for Unstructured Multi-View Stereo
    Schonberger, Johannes L.
    Zheng, Enliang
    Frahm, Jan-Michael
    Pollefeys, Marc
    [J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 501 - 518
  • [53] Schwarz Katja, 2020, NEURIPS, V1, P2
  • [54] Sitzmann Vincent., 2019, CoRR
  • [55] Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations
    Sudre, Carole H.
    Li, Wenqi
    Vercauteren, Tom
    Ourselin, Sebastien
    Cardoso, M. Jorge
    [J]. DEEP LEARNING IN MEDICAL IMAGE ANALYSIS AND MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT, 2017, 10553 : 240 - 248
  • [56] Multi-view 3D Models from Single Images with a Convolutional Network
    Tatarchenko, Maxim
    Dosovitskiy, Alexey
    Brox, Thomas
    [J]. COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 322 - 337
  • [57] HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
    Thu Nguyen-Phuoc
    Li, Chuan
    Theis, Lucas
    Richardt, Christian
    Yang, Yong-Liang
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7587 - 7596
  • [58] Trevithick Alex, 2020, ARXIV201004595
  • [59] Learning Category-Specific Deformable 3D Models for Object Reconstruction
    Tulsiani, Shubham
    Kar, Abhishek
    Carreira, Joao
    Malik, Jitendra
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) : 719 - 731
  • [60] Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction
    Tulsiani, Shubham
    Efros, Alexei A.
    Malik, Jitendra
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2897 - 2905