SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation

被引:4
|
作者
Xu, Yifei [1 ]
Xu, Xiaolong [2 ]
Gao, Honghao [3 ,4 ]
Xiao, Fu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[4] Gachon Univ, Coll Future Ind, Gyeonggi 461701, South Korea
基金
中国国家自然科学基金;
关键词
Feature extraction; Adaptation models; Image synthesis; Computational modeling; Training; Task analysis; Noise reduction; Personalized image generation; text-to-image generation; style personalization; diffusion model; image style similarity assessment;
D O I
10.1109/TMM.2024.3399075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The existing personalized text-to-image generation models face issues such as repeated training and insufficient generalization capabilities. We present an adaptive Style-Guided Diffusion Model (SGDM). When provided with a set of stylistically consistent images and prompts as inputs, SGDM can generate images that align with the prompts while maintaining style consistency with the input images. SGDM first extracts features from the input style image and then combines style features from different depths. Last, style features are injected into the noise generation process of the original Stable Diffusion (SD) model by the style-guided module we propose. This strategy fully leverages the generative and generalization capabilities of the pre-trained text-to-image model to ensure the accuracy of the generated image's content. We present a dataset construction method suitable for style personalized generation tasks of this kind, enabling the trained model to generate stylized images adaptively instead of re-training for each style. We also present an evaluation metric, StySim, to measure the style similarity between two images, and this metric shows that the style personalization capability of SGDM is the best. And metrics such as FID, KID, and CLIPSIM indicate that SGDM maintains good performance in text-to-image generation.
引用
收藏
页码:9804 / 9813
页数:10
相关论文
共 50 条
  • [41] Diffusing Colors: Image Colorization with Text Guided Diffusion
    Zabari, Nir
    Azulay, Aharon
    Gorkor, Alexey
    Halperin, Tavi
    Fried, Ohad
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [42] RSDiff: remote sensing image generation from text using diffusion model
    Ahmad Sebaq
    Mohamed ElHelw
    Neural Computing and Applications, 2024, 36 (36) : 23103 - 23111
  • [43] RSVQ-Diffusion Model for Text-to-Remote-Sensing Image Generation
    Gao, Xin
    Fu, Yao
    Jiang, Xiaonan
    Wu, Fanlu
    Zhang, Yu
    Fu, Tianjiao
    Li, Chao
    Pei, Junyan
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [44] Text-to-Audio Generation using Instruction-Guided Latent Diffusion Model
    Ghosal, Deepanway
    Majumder, Navonil
    Mehrish, Ambuj
    Poria, Soujanya
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3590 - 3598
  • [45] Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
    Jin, Peng
    Li, Hao
    Cheng, Zesen
    Li, Kehan
    Yu, Runyi
    Liu, Chang
    Ji, Xiangyang
    Yuan, Li
    Chen, Jie
    COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 : 392 - 409
  • [46] HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
    Ju, Xuan
    Zeng, Ailing
    Zhao, Chenchen
    Wang, Jianan
    Zhang, Lei
    Xu, Qiang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15942 - 15952
  • [47] SGDiff: A Style Guided Diffusion Model for Fashion Synthesis
    Sun, Zhengwentai
    Zhou, Yanghong
    He, Honghong
    Mok, P. Y.
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8433 - 8442
  • [48] Diffusion-Adapter: Text Guided Image Manipulation with Frozen Diffusion Models
    Wei, Rongting
    Fan, Chunxiao
    Wu, Yuexin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 217 - 231
  • [49] Sketch-Guided Text-to-Image Diffusion Models
    Voynov, Andrey
    Aberman, Kfir
    Cohen-Or, Daniel
    PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
  • [50] Text-guided image-to-sketch diffusion models☆
    Ke, Aihua
    Huang, Yujie
    Cai, Bo
    Yang, Jie
    KNOWLEDGE-BASED SYSTEMS, 2024, 304