SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation

被引:4
|
作者
Xu, Yifei [1 ]
Xu, Xiaolong [2 ]
Gao, Honghao [3 ,4 ]
Xiao, Fu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[4] Gachon Univ, Coll Future Ind, Gyeonggi 461701, South Korea
基金
中国国家自然科学基金;
关键词
Feature extraction; Adaptation models; Image synthesis; Computational modeling; Training; Task analysis; Noise reduction; Personalized image generation; text-to-image generation; style personalization; diffusion model; image style similarity assessment;
D O I
10.1109/TMM.2024.3399075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The existing personalized text-to-image generation models face issues such as repeated training and insufficient generalization capabilities. We present an adaptive Style-Guided Diffusion Model (SGDM). When provided with a set of stylistically consistent images and prompts as inputs, SGDM can generate images that align with the prompts while maintaining style consistency with the input images. SGDM first extracts features from the input style image and then combines style features from different depths. Last, style features are injected into the noise generation process of the original Stable Diffusion (SD) model by the style-guided module we propose. This strategy fully leverages the generative and generalization capabilities of the pre-trained text-to-image model to ensure the accuracy of the generated image's content. We present a dataset construction method suitable for style personalized generation tasks of this kind, enabling the trained model to generate stylized images adaptively instead of re-training for each style. We also present an evaluation metric, StySim, to measure the style similarity between two images, and this metric shows that the style personalization capability of SGDM is the best. And metrics such as FID, KID, and CLIPSIM indicate that SGDM maintains good performance in text-to-image generation.
引用
收藏
页码:9804 / 9813
页数:10
相关论文
共 50 条
  • [21] A Text-Guided Generation and Refinement Model for Image Captioning
    Wang, Depeng
    Hu, Zhenzhen
    Zhou, Yuanen
    Hong, Richang
    Wang, Meng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2966 - 2977
  • [22] Cross-domain image translation with a novel style-guided diversity loss design
    Li, Tingting
    Zhao, Huan
    Huang, Jing
    Li, Keqin
    KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [23] SGUNet: Style-guided UNet for adversely conditioned fundus image super-resolution
    Fan, Zhihao
    Dan, Tingting
    Liu, Baoyi
    Sheng, Xiaoqi
    Yu, Honghua
    Cai, Hongmin
    Yu, Honghua (yuhonghua@gdph.org.cn), 1600, Elsevier B.V. (465): : 238 - 247
  • [24] Learning font-style space using style-guided discriminator for few-shot font generation
    Ul Hassan, Ammar
    Memon, Irfanullah
    Choi, Jaeyoung
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
  • [25] Enhancing Style-Guided Image-to-Image Translation via Self-Supervised Metric Learning
    Mao, Qi
    Ma, Siwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8511 - 8526
  • [26] StyleEDL: Style-Guided High-order Attention Network for Image Emotion Distribution Learning
    Jing, Peiguang
    Liu, Xianyi
    Wang, Ji
    Wei, Yinwei
    Nie, Liqiang
    Su, Yuting
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 853 - 861
  • [27] Text-guided small molecule generation via diffusion model
    Luo, Yanchen
    Fang, Junfeng
    Li, Sihang
    Liu, Zhiyuan
    Wu, Jiancan
    Zhang, An
    Du, Wenjie
    Wang, Xiang
    ISCIENCE, 2024, 27 (11)
  • [28] Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation
    Wu, Zijie
    Wang, Yaonan
    Feng, Mingtao
    Xie, He
    Mian, Ajmal
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8895 - 8905
  • [29] Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation
    Pan, Zhihong
    Zhou, Xin
    Tian, Hao
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4450 - 4460
  • [30] Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
    Yang, Serin
    Hwang, Hyunmin
    Ye, Jong Chul
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22816 - 22825