DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

被引:1
作者
Zeng, Chong [1 ]
Dong, Yue [2 ]
Peers, Pieter [3 ]
Kong, Youkang [4 ]
Wu, Hongzhi [1 ]
Tong, Xin [2 ]
机构
[1] Zhejiang Univ, State Key Lab CAD & CG, Hangzhou, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Coll William & Mary, Williamsburg, VA USA
[4] Tsinghua Univ, Beijing, Peoples R China
来源
PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS | 2024年
关键词
Diffusion; Image Synthesis; Lighting Control; Radiance Hints;
D O I
10.1145/3641519.3657396
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel method for exerting fine-grained lighting control during text-driven diffusion-based image generation. While existing diffusion models already have the ability to generate images under any lighting condition, without additional guidance these models tend to correlate image content and lighting. Moreover, text prompts lack the necessary expressional power to describe detailed lighting setups. To provide the content creator with fine-grained control over the lighting during image generation, we augment the text-prompt with detailed lighting information in the form of radiance hints, i.e., visualizations of the scene geometry with a homogeneous canonical material under the target lighting. However, the scene geometry needed to produce the radiance hints is unknown. Our key observation is that we only need to guide the diffusion process, hence exact radiance hints are not necessary; we only need to point the diffusion model in the right direction. Based on this observation, we introduce a three stage method for controlling the lighting during image generation. In the first stage, we leverage a standard pretrained diffusion model to generate a provisional image under uncontrolled lighting. Next, in the second stage, we resynthesize and refine the foreground object in the generated image by passing the target lighting to a refined diffusion model, named DiLightNet, using radiance hints computed on a coarse shape of the foreground object inferred from the provisional image. To retain the texture details, we multiply the radiance hints with a neural encoding of the provisional synthesized image before passing it to DiLightNet. Finally, in the third stage, we resynthesize the background to be consistent with the lighting on the foreground object. We demonstrate and validate our lighting controlled diffusion model on a variety of text prompts and lighting conditions.
引用
收藏
页数:12
相关论文
共 78 条
  • [51] Ruiz N, 2024, Arxiv, DOI arXiv:2307.06949
  • [52] Saharia C, 2022, ADV NEUR IN
  • [53] MatFusion: A Generative Diffusion Model for SVBRDF Capture
    Sartor, Sam
    Peers, Pieter
    [J]. PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [54] Sharma P, 2023, Arxiv, DOI arXiv:2312.02970
  • [55] Neural Face Editing with Intrinsic Image Disentangling
    Shu, Zhixin
    Yumer, Ersin
    Hadap, Sunil
    Sunkavalli, Kalyan
    Shechtman, Eli
    Samaras, Dimitris
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5444 - 5453
  • [56] Song Yang, 2021, INT C LEARN REPR
  • [57] Stability AI, 2022, Stable Diffusion v2.1 and DreamStudio Updates 7-Dec22
  • [58] Stability AI, 2022, Stable Diffusion V2-Inpainting
  • [59] Single Image Portrait Relighting
    Sun, Tiancheng
    Barron, Jonathan T.
    Tsai, Yun-Ta
    Xu, Zexiang
    Yu, Xueming
    Fyffe, Graham
    Rhemann, Christoph
    Busch, Jay
    Debevec, Paul
    Ramamoorth, Ravi
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (04):
  • [60] Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
    Tumanyan, Narek
    Geyer, Michal
    Bagon, Shai
    Dekel, Tali
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1921 - 1930