Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

被引:0
作者
Li, Ruichen [1 ]
Wu, Lei [1 ]
Wang, Changshuo [1 ]
Dong, Pei [1 ]
Li, Xin [1 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024 | 2024年 / 14872卷
关键词
Text-guided image editing; Diffusion model; Image manipulation;
D O I
10.1007/978-981-97-5612-4_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models have finally enabled text-guided editing realistic scene images. The latest works utilize diffusion models and most studies focus on editing individual regions based on a given text prompt. When the user delineates multiple regions, these models cannot edit in the corresponding areas based on different text semantics. Hence, we propose a new diffusion-based text-guided multi-region scene image editing model, which can handle multiple regions and corresponding text, and focus on entity-level object editing and layout-level background coordination at different denoising steps respectively. At the early steps of the denoising, we propose a mask dilation based object editing method that dilates thinner masks to ensure the accuracy of editing multiple objects. In layout-level background coordination, we not only encourage the noisy version of the original scene image to replace the random noise in the background region in the diffusion reversion process, but also propose Outward Low-pass Filtering (OutwardLPF) to eliminate the sharp transitions of noise levels between edited image regions. We conduct extensive experiments showing that our model outperforms all baselines in terms of multi-object entity editing and background coordination.
引用
收藏
页码:229 / 240
页数:12
相关论文
共 50 条
  • [21] MorphNeRF: Text-Guided 3D-Aware Editing via Morphing Generative Neural Radiance Fields
    Yu, Yingchen
    Wu, Rongliang
    Men, Yifang
    Lu, Shijian
    Cui, Miaomiao
    Xie, Xuansong
    Miao, Chunyan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8516 - 8528
  • [22] TextDiff: Enhancing scene text image super-resolution with mask-guided residual diffusion models
    Liu, Baolin
    Yang, Zongyuan
    Chiu, Chinwai
    Xiong, Yongping
    PATTERN RECOGNITION, 2025, 164
  • [23] SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation
    Xu, Yifei
    Xu, Xiaolong
    Gao, Honghao
    Xiao, Fu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9804 - 9813
  • [24] TurboEdit: Instant Text-Based Image Editing
    Wu, Zongze
    Kolkin, Nicholas
    Brandt, Jonathan
    Zhang, Richard
    Shechtman, Eli
    COMPUTER VISION - ECCV 2024, PT LXXX, 2025, 15138 : 365 - 381
  • [25] A photo cartoonization method based on text-to-image diffusion model
    Jeon, Hwyjoon
    Shim, Jonghwa
    Kim, Hyeonwoo
    Hwang, Eenjun
    NEUROCOMPUTING, 2025, 620
  • [26] Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
    Singh, Shrey
    Keserwani, Prateek
    Roy, Partha Pratim
    Saini, Rajkumar
    IEEE ACCESS, 2024, 12 : 187640 - 187651
  • [27] AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
    Wang, Xinzhou
    Wang, Yikai
    Yee, Junliang
    Sung, Fuchun
    Wang, Zhengyi
    Wang, Ling
    Liu, Pengkun
    Sung, Kai
    Wan, Xintong
    Xie, Wende
    Liu, Fangfu
    He, Bin
    COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 : 321 - 339
  • [28] Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution
    Huang, Detian
    Song, Jiaxun
    Huang, Xiaoqian
    Hu, Zhenzhen
    Zeng, Huanqiang
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 316 - 320
  • [29] DDIMCACHE: AN ENHANCED TEXT-TO-IMAGE DIFFUSION MODEL ON MOBILE DEVICES
    Wu, Qifeng
    KYBERNETIKA, 2024, 60 (06) : 819 - 833
  • [30] ControlNeRF: Text-Driven 3D Scene Stylization via Diffusion Model
    Chen, Jiahui
    Yang, Chuanfeng
    Li, Kaiheng
    Wu, Qingqiang
    Hong, Qingqi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 395 - 406