Blended Diffusion for Text-driven Editing of Natural Images

被引:398
作者
Avrahami, Omri [1 ]
Lischinski, Dani [1 ]
Fried, Ohad [2 ]
机构
[1] Hebrew Univ Jerusalem, Jerusalem, Israel
[2] Reichman Univ, Herzliyya, Israel
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
基金
以色列科学基金会;
关键词
D O I
10.1109/CVPR52688.2022.01767
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation.
引用
收藏
页码:18187 / 18197
页数:11
相关论文
共 59 条
[1]   Image2StyleGAN++: How to Edit the Embedded Images? [J].
Abdal, Rameen ;
Qin, Yipeng ;
Wonka, Peter .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8293-8302
[2]   Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? [J].
Abdal, Rameen ;
Qin, Yipeng ;
Wonka, Peter .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4431-4440
[3]  
Alaluf Yuval, 2021, P IEEECVF INT C COMP, P6711
[4]  
[Anonymous], 2019, ADV NEUR IN
[5]  
[Anonymous], 1983, READINGS COMPUTER VI
[6]  
[Anonymous], 2017, NIPS
[7]  
Bau David, 2021, arXiv preprint arXiv:2103.10951
[8]  
Blake Aaron, 2017, Vox
[9]  
Bruna J., 2014, INT C LEARN REPR
[10]  
Crowson K., VQGAN CLIP