Bounded Editing: Multi-Object Image Manipulation with Region-Specific Control

被引：0

作者：

Kang, Mingyu ^{[1
]}

Kim, Keon ^{[1
]}

Choi, Yong Suk ^{[1
]}

机构：

[1] Hanyang Univ, Seoul, South Korea

来源：

40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2025年

基金：

新加坡国家研究基金会;

关键词：

Text-To-Image; Text-Guided-Image-Editing; Diffusion Models; Computer vision; TEXT;

D O I：

10.1145/3672608.3707793

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Recent diffusion-based models have achieved significant success in vision domains such as image generation, and text-guided image manipulation. Text-guided image editing aims for users to modify specific objects and their attributes based on textual descriptions. However, current image editing approaches are susceptible to unintended modifications to non-target regions or other target regions in images when altering multiple objects. Some studies depend on detailed masks that are challenging to obtain for fine-grained image editing. To address these issues, we propose Bounded Editing, which allows for precise manipulation of specific areas. First, our approach separates the target and non-target regions from the image using bounding boxes. Second, we propose a guidance loss that enhances editing capabilities, enabling precise modifications to target objects while preventing undesired changes to background. By integrating our method with an existing image editing framework, we achieve significant improvements over state-of-the-art methods. Extensive experiments demonstrate the effectiveness of our proposed approach in changing objects, and modifying attributes such as colors and materials, especially on multi-object editing scenarios.

引用

页码：1122 / 1129

页数：8

共 37 条

[1] Blended Latent Diffusion [J].

Avrahami, Omri ;

Fried, Ohad ;

Lischinski, Dani .

ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04)

[2] Blended Diffusion for Text-driven Editing of Natural Images [J].

Avrahami, Omri ;

Lischinski, Dani ;

Fried, Ohad .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :18187-18197

[3]

Betker J., 2023, Computer Science, V2, P8

[4] LEDITS plus plus : Limitless Image Editing using Text-to-Image Models [J].

Brack, Manuel ;

Friedrich, Felix ;

Kornmeier, Katharina ;

Tsaban, Linoy ;

Schramowski, Patrick ;

Kersting, Kristian ;

Passos, Apolinario .

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, :8861-8870

[5] MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing [J].

Cao, Mingdeng ;

Wang, Xintao ;

Qi, Zhongang ;

Shan, Ying ;

Qie, Xiaohu ;

Zheng, Yinqiang .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :22503-22513

[6] Training-Free Layout Control with Cross-Attention Guidance [J].

Chen, Minghao ;

Laina, Iro ;

Vedaldi, Andrea .

2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, :5331-5341

[7]

Cho HS, 2024, Arxiv, DOI arXiv:2402.04625

[8]

Couairon G., 2022, arXiv

[9]

Dahary Omer, 2024, arXiv

[10]

Dhariwal P, 2021, ADV NEUR IN, V34

← 1 2 3 4 →