Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

被引:3
|
作者
Ham, Cusuh [1 ]
Hays, James [1 ]
Lu, Jingwan [2 ]
Singh, Krishna Kumar [2 ]
Zhang, Zhifei [2 ]
Hinz, Tobias [2 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Adobe Res, San Francisco, CA USA
关键词
image synthesis; image generation; multimodal synthesis; neural networks; diffusion models;
D O I
10.1145/3588432.3591549
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous multimodal synthesis works rely on training networks from scratch or fine-tuning pretrained networks, both of which are computationally expensive for large, state-of-the-art diffusion models. Our method uses pretrained networks but does not require any updates to the diffusion network's parameters. MCM is a small module trained to modulate the diffusion network's predictions during sampling using 2D modalities (e.g., semantic segmentation maps, sketches) that were unseen during the original training of the diffusion model. We show that MCM enables user control over the spatial layout of the image and leads to increased control over the image generation process. Training MCM is cheap as it does not require gradients from the original diffusion net, consists of only similar to 1% of the number of parameters of the base diffusion model, and is trained using only a limited number of training examples. We evaluate our method on unconditional and text-conditional models to demonstrate the improved control over the generated images and their alignment with respect to the conditioning inputs.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Conditional Diffusion Models and Retinal Image Synthesis in Diabetic Retinopathy
    Nderitu, Paul
    do Rio, Joan M. Nunez
    Webster, Laura
    Mann, Samantha S.
    Hopkins, David
    Cardoso, M. Jorge
    Modat, Marc
    Bergeles, Christos
    Jackson, Timothy
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)
  • [22] Diverse Hyperspectral Remote Sensing Image Synthesis With Diffusion Models
    Liu, Liqin
    Chen, Bowen
    Chen, Hao
    Zou, Zhengxia
    Shi, Zhenwei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 16
  • [23] High-Resolution Image Synthesis with Latent Diffusion Models
    Rombach, Robin
    Blattmann, Andreas
    Lorenz, Dominik
    Esser, Patrick
    Ommer, Bjoern
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10674 - 10685
  • [24] RS Invariant Image Classification and Retrieval with Pretrained Deep Learning Models
    Hire, D. N.
    Patil, A. V.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 413 - 417
  • [25] Expanding Language-Image Pretrained Models for General Video Recognition
    Ni, Bolin
    Peng, Houwen
    Chen, Minghao
    Zhang, Songyang
    Meng, Gaofeng
    Fu, Jianlong
    Xiang, Shiming
    Ling, Haibin
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 1 - 18
  • [26] Shape-bias Evaluation of Pretrained Models using Image Decomposition
    Iwata, Akinori
    Okuda, Masahiro
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1371 - 1375
  • [27] Application of diffusion kernel in multimodal image retrieval
    Agrawal, Rajeev
    Grosky, William
    Fotouhi, Farshad
    Wu, Changhua
    ISM WORKSHOPS 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA - WORKSHOPS, PROCEEDINGS, 2007, : 271 - +
  • [28] Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
    Baldrati, Alberto
    Morelli, Davide
    Cartella, Giuseppe
    Cornia, Marcella
    Bertini, Marco
    Cucchiara, Rita
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 23336 - 23345
  • [29] Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
    Zhai, Shengfang
    Dong, Yinpeng
    Shen, Qingni
    Pu, Shi
    Fang, Yuejian
    Su, Hang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 1577 - 1587
  • [30] HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models
    Zhang, Shen
    Chen, Zhaowei
    Zhao, Zhenyu
    Chen, Yuhao
    Tang, Yao
    Liang, Jiajun
    COMPUTER VISION - ECCV 2024, PT LI, 2025, 15109 : 145 - 161