Vox-E: Text-guided Voxel Editing of 3D Objects

被引:21
作者
Sella, Etai [1 ]
Fiebelman, Gal [1 ]
Hedman, Peter [2 ]
Averbuch-Elor, Hadar [1 ]
机构
[1] Tel Aviv Univ, Tel Aviv, Israel
[2] Google Res, New York, NY 10011 USA
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年
关键词
D O I
10.1109/ICCV51070.2023.00046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images that convey complex visual concepts. This generative power has more recently been leveraged to perform text-to-3D synthesis. In this work, we present a technique that harnesses the power of latent diffusion models for editing existing 3D objects. Our method takes oriented 2D images of a 3D object as input and learns a grid-based volumetric representation of it. To guide the volumetric representation to conform to a target text prompt, we follow unconditional text-to-3D methods and optimize a Score Distillation Sampling (SDS) loss. However, we observe that combining this diffusion-guided loss with an image-based regularization loss that encourages the representation not to deviate too strongly from the input object is challenging, as it requires achieving two conflicting goals while viewing only structure-and-appearance coupled 2D projections. Thus, we introduce a novel volumetric regularization loss that operates directly in 3D space, utilizing the explicit nature of our 3D representation to enforce correlation between the global structure of the original and edited object. Furthermore, we present a technique that optimizes cross-attention volumetric grids to refine the spatial extent of the edits. Extensive experiments and comparisons demonstrate the effectiveness of our approach in creating a myriad of edits which cannot be achieved by prior works(1).
引用
收藏
页码:430 / 440
页数:11
相关论文
共 50 条
  • [1] StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows
    Abdal, Rameen
    Zhu, Peihao
    Mitra, Niloy J.
    Wonka, Peter
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (03):
  • [2] Achlioptas Panos, 2022, Changeit3d: Languageassisted 3d shape edits and deformations
  • [3] Interactive digital photomontage
    Agarwala, A
    Dontcheva, M
    Agrawala, M
    Drucker, S
    Colburn, A
    Curless, B
    Salesin, D
    Cohen, M
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03): : 294 - 302
  • [4] Blended Diffusion for Text-driven Editing of Natural Images
    Avrahami, Omri
    Lischinski, Dani
    Fried, Ohad
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18187 - 18197
  • [5] Text2LIVE: Text-Driven Layered Image and Video Editing
    Bar-Tal, Omer
    Ofri-Amar, Dolev
    Fridman, Rafail
    Kasten, Yoni
    Dekel, Tali
    [J]. COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 707 - 723
  • [6] Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
    Barron, Jonathan T.
    Mildenhall, Ben
    Verbin, Dor
    Srinivasan, Pratul P.
    Hedman, Peter
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5460 - 5469
  • [7] Fast approximate energy minimization via graph cuts
    Boykov, Y
    Veksler, O
    Zabih, R
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (11) : 1222 - 1239
  • [8] InstructPix2Pix: Learning to Follow Image Editing Instructions
    Brooks, Tim
    Holynski, Aleksander
    Efros, Alexei A.
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18392 - 18402
  • [9] Chefer H, 2023, Arxiv, DOI arXiv:2301.13826
  • [10] Chen YW, 2022, Arxiv, DOI arXiv:2210.11277