Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

被引：0

作者：

Li, Ruichen ^{[1
]}

Wu, Lei ^{[1
]}

Wang, Changshuo ^{[1
]}

Dong, Pei ^{[1
]}

Li, Xin ^{[1
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024 | 2024年 / 14872卷

关键词：

Text-guided image editing; Diffusion model; Image manipulation;

D O I：

10.1007/978-981-97-5612-4_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models have finally enabled text-guided editing realistic scene images. The latest works utilize diffusion models and most studies focus on editing individual regions based on a given text prompt. When the user delineates multiple regions, these models cannot edit in the corresponding areas based on different text semantics. Hence, we propose a new diffusion-based text-guided multi-region scene image editing model, which can handle multiple regions and corresponding text, and focus on entity-level object editing and layout-level background coordination at different denoising steps respectively. At the early steps of the denoising, we propose a mask dilation based object editing method that dilates thinner masks to ensure the accuracy of editing multiple objects. In layout-level background coordination, we not only encourage the noisy version of the original scene image to replace the random noise in the background region in the diffusion reversion process, but also propose Outward Low-pass Filtering (OutwardLPF) to eliminate the sharp transitions of noise levels between edited image regions. We conduct extensive experiments showing that our model outperforms all baselines in terms of multi-object entity editing and background coordination.

引用

页码：229 / 240

页数：12

共 50 条

[21] MorphNeRF: Text-Guided 3D-Aware Editing via Morphing Generative Neural Radiance Fields
Yu, Yingchen
Wu, Rongliang
Men, Yifang
Lu, Shijian
Cui, Miaomiao
Xie, Xuansong
Miao, Chunyan
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8516 - 8528
[22] TextDiff: Enhancing scene text image super-resolution with mask-guided residual diffusion models
Liu, Baolin
Yang, Zongyuan
Chiu, Chinwai
Xiong, Yongping
PATTERN RECOGNITION, 2025, 164
[23] SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation
Xu, Yifei
Xu, Xiaolong
Gao, Honghao
Xiao, Fu
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9804 - 9813
[24] TurboEdit: Instant Text-Based Image Editing
Wu, Zongze
Kolkin, Nicholas
Brandt, Jonathan
Zhang, Richard
Shechtman, Eli
COMPUTER VISION - ECCV 2024, PT LXXX, 2025, 15138 : 365 - 381
[25] A photo cartoonization method based on text-to-image diffusion model
Jeon, Hwyjoon
Shim, Jonghwa
Kim, Hyeonwoo
Hwang, Eenjun
NEUROCOMPUTING, 2025, 620
[26] Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
Singh, Shrey
Keserwani, Prateek
Roy, Partha Pratim
Saini, Rajkumar
IEEE ACCESS, 2024, 12 : 187640 - 187651
[27] AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
Wang, Xinzhou
Wang, Yikai
Yee, Junliang
Sung, Fuchun
Wang, Zhengyi
Wang, Ling
Liu, Pengkun
Sung, Kai
Wan, Xintong
Xie, Wende
Liu, Fangfu
He, Bin
COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 : 321 - 339
[28] Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution
Huang, Detian
Song, Jiaxun
Huang, Xiaoqian
Hu, Zhenzhen
Zeng, Huanqiang
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 316 - 320
[29] DDIMCACHE: AN ENHANCED TEXT-TO-IMAGE DIFFUSION MODEL ON MOBILE DEVICES
Wu, Qifeng
KYBERNETIKA, 2024, 60 (06) : 819 - 833
[30] ControlNeRF: Text-Driven 3D Scene Stylization via Diffusion Model
Chen, Jiahui
Yang, Chuanfeng
Li, Kaiheng
Wu, Qingqiang
Hong, Qingqi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 395 - 406

← 1 2 3 4 5 →