PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction

被引:25
作者
Melas-Kyriazi, Luke [1 ]
Rupprecht, Christian [1 ]
Vedaldi, Andrea [1 ]
机构
[1] Univ Oxford, Visual Geometry Grp, Dept Engn Sci, Oxford, England
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.01242
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reconstructing the 3D shape of an object from a single RGB image is a long-standing problem in computer vision. In this paper, we propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process. Our method takes as input a single RGB image along with its camera pose and gradually denoises a set of 3D points, whose positions are initially sampled randomly from a three-dimensional Gaussian distribution, into the shape of an object. The key to our method is a geometrically-consistent conditioning process which we call projection conditioning: at each step in the diffusion process, we project local image features onto the partially-denoised point cloud from the given camera pose. This projection conditioning process enables us to generate high-resolution sparse geometries that are well-aligned with the input image and can additionally be used to predict point colors after shape reconstruction. Moreover, due to the probabilistic nature of the diffusion process, our method is naturally capable of generating multiple different shapes consistent with a single input image. In contrast to prior work, our approach not only performs well on synthetic benchmarks but also gives large qualitative improvements on complex real-world data. Data and code are available at https://lukemelas.github.io/projectionconditioned-point-cloud-diffusion/.
引用
收藏
页码:12923 / 12932
页数:10
相关论文
共 57 条
  • [1] 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
    Choy, Christopher B.
    Xu, Danfei
    Gwak, Jun Young
    Chen, Kevin
    Savarese, Silvio
    [J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 628 - 644
  • [2] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
    Dai, Angela
    Qi, Charles Ruizhongtai
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
  • [3] DHARMAWARDENA P, 2021, ADV NEURAL INFORM PR, DOI DOI 10.1080/20477724.2021.1951556
  • [4] Dosovitskiy A., 2020, PREPRINT
  • [5] Hartley R., 2000, Multiple View Geometry in Computer Vision
  • [6] Hartley Richard I., 2004, LNCS, P2
  • [7] He K., 2017, P ICCV, DOI DOI 10.1109/ICCV.2017.322
  • [8] He Kaiming, 2021, P CVPR
  • [9] Unsupervised Learning of 3D Object Categories from Videos in the Wild
    Henzler, Philipp
    Reizenstein, Jeremy
    Labatut, Patrick
    Shapovalov, Roman
    Ritschel, Tobias
    Vedaldi, Andrea
    Novotny, David
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4698 - 4707
  • [10] Ho J., 2020, P NIPS, V33, P6840