Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

被引:0
|
作者
Li, Ruichen [1 ]
Wu, Lei [1 ]
Wang, Changshuo [1 ]
Dong, Pei [1 ]
Li, Xin [1 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024 | 2024年 / 14872卷
关键词
Text-guided image editing; Diffusion model; Image manipulation;
D O I
10.1007/978-981-97-5612-4_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models have finally enabled text-guided editing realistic scene images. The latest works utilize diffusion models and most studies focus on editing individual regions based on a given text prompt. When the user delineates multiple regions, these models cannot edit in the corresponding areas based on different text semantics. Hence, we propose a new diffusion-based text-guided multi-region scene image editing model, which can handle multiple regions and corresponding text, and focus on entity-level object editing and layout-level background coordination at different denoising steps respectively. At the early steps of the denoising, we propose a mask dilation based object editing method that dilates thinner masks to ensure the accuracy of editing multiple objects. In layout-level background coordination, we not only encourage the noisy version of the original scene image to replace the random noise in the background region in the diffusion reversion process, but also propose Outward Low-pass Filtering (OutwardLPF) to eliminate the sharp transitions of noise levels between edited image regions. We conduct extensive experiments showing that our model outperforms all baselines in terms of multi-object entity editing and background coordination.
引用
收藏
页码:229 / 240
页数:12
相关论文
共 50 条
  • [1] Controlling Attention Map Better for Text-Guided Image Editing Diffusion Models
    Xu, Siqi
    Sun, Lijun
    Liu, Guanming
    Wei, Zhihua
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 54 - 65
  • [2] FocusGAN: Preserving Background in Text-Guided Image Editing
    Zhao, Liuqing
    Li, Linyan
    Hu, Fuyuan
    Xia, Zhenping
    Yao, Rui
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (16)
  • [3] Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media
    Watanabe, Yuto
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    SENSORS, 2024, 24 (03)
  • [4] Where you edit is what you get: Text-guided image editing with region-based attention
    Xiao, Changming
    Yang, Qi
    Xu, Xiaoqiang
    Zhang, Jianwei
    Zhou, Feng
    Zhang, Changshui
    PATTERN RECOGNITION, 2023, 139
  • [5] MFECLIP: CLIP With Mapping-Fusion Embedding for Text-Guided Image Editing
    Wu, Fei
    Ma, Yongheng
    Jin, Hao
    Jing, Xiao-Yuan
    Jiang, Guo-Ping
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 116 - 120
  • [6] Text-Guided Customizable Image Synthesis and Manipulation
    Zhang, Zhiqiang
    Fu, Chen
    Weng, Wei
    Zhou, Jinjia
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [7] LETTER EMBEDDING GUIDANCE DIFFUSION MODEL FOR SCENE TEXT EDITING
    Wang, Changshuo
    Wu, Lei
    Chen, Xu
    Li, Xiang
    Meng, Lei
    Meng, Xiangxu
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 588 - 593
  • [8] WAVELET-GUIDED ACCELERATION OF TEXT INVERSION IN DIFFUSION-BASED IMAGE EDITING
    Koo, Gwanhyeong
    Yoon, Sunjae
    Yoo, Chang D.
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4380 - 4384
  • [9] Enhancing Label-Efficient Medical Image Segmentation with Text-Guided Diffusion Models
    Feng, Chun-Mei
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VIII, 2024, 15008 : 253 - 262
  • [10] FusionDeformer: text-guided mesh deformation using diffusion models
    Xu, Hao
    Wu, Yiqian
    Tang, Xiangjun
    Zhang, Jing
    Zhang, Yang
    Zhang, Zhebin
    Li, Chen
    Jin, Xiaogang
    VISUAL COMPUTER, 2024, 40 (07): : 4701 - 4712