Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

被引:0
作者
Li, Ruichen [1 ]
Wu, Lei [1 ]
Wang, Changshuo [1 ]
Dong, Pei [1 ]
Li, Xin [1 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024 | 2024年 / 14872卷
关键词
Text-guided image editing; Diffusion model; Image manipulation;
D O I
10.1007/978-981-97-5612-4_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models have finally enabled text-guided editing realistic scene images. The latest works utilize diffusion models and most studies focus on editing individual regions based on a given text prompt. When the user delineates multiple regions, these models cannot edit in the corresponding areas based on different text semantics. Hence, we propose a new diffusion-based text-guided multi-region scene image editing model, which can handle multiple regions and corresponding text, and focus on entity-level object editing and layout-level background coordination at different denoising steps respectively. At the early steps of the denoising, we propose a mask dilation based object editing method that dilates thinner masks to ensure the accuracy of editing multiple objects. In layout-level background coordination, we not only encourage the noisy version of the original scene image to replace the random noise in the background region in the diffusion reversion process, but also propose Outward Low-pass Filtering (OutwardLPF) to eliminate the sharp transitions of noise levels between edited image regions. We conduct extensive experiments showing that our model outperforms all baselines in terms of multi-object entity editing and background coordination.
引用
收藏
页码:229 / 240
页数:12
相关论文
共 50 条
[41]   Anything to Glyph: Artistic Font Synthesis via Text-to-Image Diffusion Model [J].
Wang, ChangShuo ;
Wu, Lei ;
Liu, XiaoLe ;
Li, Xiang ;
Meng, Lei ;
Meng, Xiangxu .
PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
[42]   PROUD: PaRetO-gUided diffusion model for multi-objective generation [J].
Yao, Yinghua ;
Pan, Yuangang ;
Li, Jing ;
Tsang, Ivor ;
Yao, Xin .
MACHINE LEARNING, 2024, 113 (09) :6511-6538
[43]   CBCT-based synthetic CT image generation using a diffusion model for CBCT-guided lung radiotherapy [J].
Chen, Xiaoqian ;
Qiu, Richard L. J. ;
Peng, Junbo ;
Shelton, Joseph W. ;
Chang, Chih-Wei ;
Yang, Xiaofeng ;
Kesarwala, Aparna H. .
MEDICAL PHYSICS, 2024, 51 (11) :8168-8178
[44]   Fine-Grained Human Hair Segmentation Using a Text-to-Image Diffusion Model [J].
Kim, Dohyun ;
Lee, Euna ;
Yoo, Daehyun ;
Lee, Hongchul .
IEEE ACCESS, 2024, 12 :13912-13922
[45]   DiffRSS: A Diffusion-Guided Multi-Scale Features Remote Sensing Image Segmentation Method [J].
Liu, Honghao ;
Yang, Ruixia ;
Xu, Yue ;
Chen, Zhengchao ;
Zheng, Yuyang .
IEEE ACCESS, 2025, 13 :802-816
[46]   BDMUIE: Underwater image enhancement based on Bayesian diffusion model [J].
Chen, Lingfeng ;
Xu, Zhihan ;
Wei, Chao ;
Xu, Yuanxin .
NEUROCOMPUTING, 2025, 620
[47]   Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation [J].
Zhou, Yinchi ;
Chen, Tianqi ;
Hou, Jun ;
Xie, Huidong ;
Dvornek, Nicha C. ;
Zhou, S. Kevin ;
Wilson, David L. ;
Duncan, James S. ;
Liu, Chi ;
Zhou, Bo .
MEDICAL IMAGE ANALYSIS, 2024, 98
[48]   Sketch-Guided Latent Diffusion Model for High-Fidelity Face Image Synthesis [J].
Peng, Yichen ;
Zhao, Chunqi ;
Xie, Haoran ;
Fukusato, Tsukasa ;
Miyata, Kazunori .
IEEE ACCESS, 2024, 12 :5770-5780
[49]   SDGDiff: A spectral-domain guided diffusion model for underwater image super-resolution [J].
Xin Liu ;
Kaijie Zhang ;
Zhiqiu Yan ;
Sipei Huang ;
Zhengxiang Gu ;
Yi Zhang ;
Li Wang .
Signal, Image and Video Processing, 2025, 19 (9)
[50]   When guided diffusion model meets zero-shot image super-resolution [J].
Liu, Huan ;
Shao, Mingwen ;
Shang, Kai ;
Qiao, Yuanjian ;
Wang, Shuigen .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138