Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

被引:59
作者
Wang, Su [1 ]
Saharia, Chitwan [1 ]
Montgomery, Ceslee [1 ]
Pont-Tuset, Jordi [1 ]
Noy, Shai [1 ]
Pellegrini, Stefano [1 ]
Onoe, Yasumasa [1 ]
Laszlo, Sarah [1 ]
Fleet, David J. [1 ]
Soricut, Radu [1 ]
Baldridge, Jason [1 ]
Norouzi, Mohammad [1 ]
Anderson, Peter [1 ]
Chan, William [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.01761
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen [36] on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. In addition, Imagen Editor captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment - such that Imagen Editor is preferred over DALL-E 2 [31] and Stable Diffusion [33] - and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
引用
收藏
页码:18359 / 18369
页数:11
相关论文
共 53 条
[1]  
Avrahami Omri, 2022, P CVPR
[2]  
Bar-Tal O., 2022, Text2live: Text-driven layered image and video editing
[3]  
Bau David, 2021, ABS210310951 CORR, P2
[4]  
Birhane Abeba, 2021, ARXIV211001963
[5]  
Couairon G., 2022, Diffedit: Diffusion-based semantic image editing with mask guidance
[6]  
De Bortoli V, 2021, ADV NEUR IN, V34
[7]  
Ding Ming, 2022, ARXIV220414217
[8]  
Dolnicar Sara, 2011, INT J MARKET RES, V12
[9]  
Gafni Oran, 2022, arXiv:2203.13131
[10]  
Gao Ying, 2022, P IEEE