Context Diffusion: In-Context Aware Image Generation

被引：0

作者：

Najdenkoska, Ivona ^{[1
,2
]}

Sinha, Animesh ^{[1
]}

Dubey, Abhimanyu ^{[1
]}

Mahajan, Dhruv ^{[1
]}

Ramanathan, Vignesh ^{[1
]}

Radenovic, Filip ^{[1
]}

机构：

[1] Meta GenAI, Menlo Pk, CA 94025 USA

[2] Univ Amsterdam, Amsterdam, Netherlands

来源：

COMPUTER VISION - ECCV 2024, PT LXXVII | 2024年 / 15135卷

关键词：

Image generation; Diffusion models; In-context learning;

D O I：

10.1007/978-3-031-72980-5_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and context fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models cannot truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and the preservation of the desired image layout. This results in the ability to learn from the visual context and prompts, but also from either of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and human evaluation demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and context fidelity compared to counterpart models.

引用

页码：375 / 391

页数：17

共 56 条

[1] Alayrac JB, 2022, ADV NEUR IN
[2] Anil GTGR, 2023, Arxiv, DOI [arXiv:2312.11805, 10.48550/arXiv.2312.11805]
[3] SpaText: Spatio-Textual Representation for Controllable Image Generation
Avrahami, Omri
Hayes, Thomas
Gafni, Oran
Gupta, Sonal
Taigman, Yaniv
Parikh, Devi
Lischinski, Dani
Fried, Ohad
Yin, Xi
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18370 - 18380
[4] Bar A, 2022, Arxiv, DOI arXiv:2209.00647
[5] MaskSketch: Unpaired Structure-guided Masked Image Generation
Bashkirova, Dina
Lezama, Jose
Sohn, Kihyuk
Saenko, Kate
Essa, Irfan
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1879 - 1889
[6] Blattmann A, 2022, Arxiv, DOI [arXiv:2204.11824, 10.48550/ARXIV.2204.11824]
[7] InstructPix2Pix: Learning to Follow Image Editing Instructions
Brooks, Tim
Holynski, Aleksander
Efros, Alexei A.
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18392 - 18402
[8] Brown TB, 2020, ADV NEUR IN, V33
[9] Chen WH, 2023, Arxiv, DOI arXiv:2304.00186
[10] Chen WH, 2022, Arxiv, DOI arXiv:2209.14491

← 1 2 3 4 5 6 →