OBJECT 3DIT: Language-guided 3D-aware Image Editing

被引:0
作者
Michel, Oscar [1 ]
Bhattad, Anand [2 ]
VanderBilt, Eli [1 ]
Krishna, Ranjay [1 ,3 ]
Kembhavi, Aniruddha [1 ]
Gupta, Tanmay [1 ]
机构
[1] Allen Inst Artificial Intelligence, Seattle, WA 98103 USA
[2] Univ Illinois, Champaign, IL USA
[3] Univ Washington, Seattle, WA USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing image editing tools, while powerful, typically disregard the underlying 3D geometry from which the image is projected. As a result, edits made using these tools may become detached from the geometry and lighting conditions that are at the foundation of the image formation process. In this work, we formulate the new task of language-guided 3D-aware editing, where objects in an image should be edited according to a language instruction in context of the underlying 3D scene. To promote progress towards this goal, we release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes. Each example consists of an input image, editing instruction in language, and the edited image. We also introduce 3DIT: single and multi-task models for four editing tasks. Our models show impressive abilities to understand the 3D composition of entire scenes, factoring in surrounding objects, surfaces, lighting conditions, shadows, and physically-plausible object configurations. Surprisingly, training on only synthetic scenes from OBJECT, editing capabilities of 3DIT generalize to real-world images. More information can be found on the project page at https://prior.allenai.org/projects/object-edit.
引用
收藏
页数:20
相关论文
共 85 条
[61]   High-Resolution Image Synthesis with Latent Diffusion Models [J].
Rombach, Robin ;
Blattmann, Andreas ;
Lorenz, Dominik ;
Esser, Patrick ;
Ommer, Bjoern .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :10674-10685
[62]  
Saharia C, 2022, ADV NEUR IN
[63]  
Schuhmann C., 2022, Advances in Neural Information Processing Systems
[64]  
Schuhmann Christoph, 2022, arXiv
[65]   InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs [J].
Shen, Yujun ;
Yang, Ceyuan ;
Tang, Xiaoou ;
Zhou, Bolei .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (04) :2004-2018
[66]   GAN-Control: Explicitly Controllable GANs [J].
Shoshan, Alon ;
Bhonker, Nadav ;
Kviatkovsky, Igor ;
Medioni, Gerard .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :14063-14073
[67]  
Song J., 2020, arXiv
[68]   Association between neighbourhood composition, kindergarten educator-reported distance learning barriers, and return to school concerns during the first wave of the COVID-19 pandemic in Ontario, Canada [J].
Spadafora, Natalie ;
Wang, Jade ;
Reid-Westoby, Caroline ;
Janus, Magdalena .
INTERNATIONAL JOURNAL OF POPULATION DATA SCIENCE (IJPDS), 2022, 7 (04)
[69]  
Szot Andrew, 2021, Advances in Neural Information Processing Systems, V34
[70]  
Tang J., 2023, ARXIV