Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

被引:3
|
作者
Ham, Cusuh [1 ]
Hays, James [1 ]
Lu, Jingwan [2 ]
Singh, Krishna Kumar [2 ]
Zhang, Zhifei [2 ]
Hinz, Tobias [2 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Adobe Res, San Francisco, CA USA
关键词
image synthesis; image generation; multimodal synthesis; neural networks; diffusion models;
D O I
10.1145/3588432.3591549
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous multimodal synthesis works rely on training networks from scratch or fine-tuning pretrained networks, both of which are computationally expensive for large, state-of-the-art diffusion models. Our method uses pretrained networks but does not require any updates to the diffusion network's parameters. MCM is a small module trained to modulate the diffusion network's predictions during sampling using 2D modalities (e.g., semantic segmentation maps, sketches) that were unseen during the original training of the diffusion model. We show that MCM enables user control over the spatial layout of the image and leads to increased control over the image generation process. Training MCM is cheap as it does not require gradients from the original diffusion net, consists of only similar to 1% of the number of parameters of the base diffusion model, and is trained using only a limited number of training examples. We evaluate our method on unconditional and text-conditional models to demonstrate the improved control over the generated images and their alignment with respect to the conditioning inputs.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition
    Wang, Mengmeng
    Xing, Jiazheng
    Mei, Jianbiao
    Liu, Yong
    Jiang, Yunliang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 625 - 637
  • [42] ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition
    Wang, Mengmeng
    Xing, Jiazheng
    Mei, Jianbiao
    Liu, Yong
    Jiang, Yunliang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 625 - 637
  • [43] Facial finetuning: using pretrained image classification models to predict politicians' success
    Lindholm, Asbjorn
    Hjorth, Christian
    Schuessler, Julian
    POLITICAL SCIENCE RESEARCH AND METHODS, 2024,
  • [44] Zero-Shot Image Caption Inference System Based on Pretrained Models
    Zhang, Xiaochen
    Shen, Jiayi
    Wang, Yuyan
    Xiao, Jiacong
    Li, Jin
    ELECTRONICS, 2024, 13 (19)
  • [45] HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models
    Zhou, Xinrui
    Huang, Yuhao
    Xue, Wufeng
    Dou, Haoran
    Cheng, Jun
    Zhou, Han
    Ni, Dong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 361 - 371
  • [46] Gramian angular fields for leveraging pretrained computer vision models with anomalous diffusion trajectories
    Garibo-i-Orts, Oscar
    Firbas, Nicolas
    Sebastia, Laura
    Conejero, J. Alberto
    PHYSICAL REVIEW E, 2023, 107 (03)
  • [47] A Survey of Pretrained Language Models
    Sun, Kaili
    Luo, Xudong
    Luo, Michael Y.
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 442 - 456
  • [48] Flexible content-aware image synthesis for maritime tasks with diffusion models
    Xue, Zhenfeng
    Hu, Yuanqi
    Lu, Ankang
    Chen, Zhuo
    Zang, Ying
    Miao, Zhonghua
    APPLIED OCEAN RESEARCH, 2025, 158
  • [49] Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
    Wang, Ruichen
    Chen, Zekang
    Chen, Chen
    Ma, Jian
    Lu, Haonan
    Lin, Xiaodong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5544 - 5552
  • [50] Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models
    Chen, Tingxiu
    Shi, Yilei
    Zheng, Zixuan
    Yan, Bingcong
    Hu, Jingliang
    Zhu, Xiao Xiang
    Mou, Lichao
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT IV, 2024, 15004 : 764 - 774