Context Diffusion: In-Context Aware Image Generation

被引:0
作者
Najdenkoska, Ivona [1 ,2 ]
Sinha, Animesh [1 ]
Dubey, Abhimanyu [1 ]
Mahajan, Dhruv [1 ]
Ramanathan, Vignesh [1 ]
Radenovic, Filip [1 ]
机构
[1] Meta GenAI, Menlo Pk, CA 94025 USA
[2] Univ Amsterdam, Amsterdam, Netherlands
来源
COMPUTER VISION - ECCV 2024, PT LXXVII | 2024年 / 15135卷
关键词
Image generation; Diffusion models; In-context learning;
D O I
10.1007/978-3-031-72980-5_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and context fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models cannot truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and the preservation of the desired image layout. This results in the ability to learn from the visual context and prompts, but also from either of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and human evaluation demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and context fidelity compared to counterpart models.
引用
收藏
页码:375 / 391
页数:17
相关论文
共 56 条
  • [1] Alayrac JB, 2022, ADV NEUR IN
  • [2] Anil GTGR, 2023, Arxiv, DOI [arXiv:2312.11805, 10.48550/arXiv.2312.11805]
  • [3] SpaText: Spatio-Textual Representation for Controllable Image Generation
    Avrahami, Omri
    Hayes, Thomas
    Gafni, Oran
    Gupta, Sonal
    Taigman, Yaniv
    Parikh, Devi
    Lischinski, Dani
    Fried, Ohad
    Yin, Xi
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18370 - 18380
  • [4] Bar A, 2022, Arxiv, DOI arXiv:2209.00647
  • [5] MaskSketch: Unpaired Structure-guided Masked Image Generation
    Bashkirova, Dina
    Lezama, Jose
    Sohn, Kihyuk
    Saenko, Kate
    Essa, Irfan
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1879 - 1889
  • [6] Blattmann A, 2022, Arxiv, DOI [arXiv:2204.11824, 10.48550/ARXIV.2204.11824]
  • [7] InstructPix2Pix: Learning to Follow Image Editing Instructions
    Brooks, Tim
    Holynski, Aleksander
    Efros, Alexei A.
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18392 - 18402
  • [8] Brown TB, 2020, ADV NEUR IN, V33
  • [9] Chen WH, 2023, Arxiv, DOI arXiv:2304.00186
  • [10] Chen WH, 2022, Arxiv, DOI arXiv:2209.14491