SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation

被引:4
|
作者
Xu, Yifei [1 ]
Xu, Xiaolong [2 ]
Gao, Honghao [3 ,4 ]
Xiao, Fu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[4] Gachon Univ, Coll Future Ind, Gyeonggi 461701, South Korea
基金
中国国家自然科学基金;
关键词
Feature extraction; Adaptation models; Image synthesis; Computational modeling; Training; Task analysis; Noise reduction; Personalized image generation; text-to-image generation; style personalization; diffusion model; image style similarity assessment;
D O I
10.1109/TMM.2024.3399075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The existing personalized text-to-image generation models face issues such as repeated training and insufficient generalization capabilities. We present an adaptive Style-Guided Diffusion Model (SGDM). When provided with a set of stylistically consistent images and prompts as inputs, SGDM can generate images that align with the prompts while maintaining style consistency with the input images. SGDM first extracts features from the input style image and then combines style features from different depths. Last, style features are injected into the noise generation process of the original Stable Diffusion (SD) model by the style-guided module we propose. This strategy fully leverages the generative and generalization capabilities of the pre-trained text-to-image model to ensure the accuracy of the generated image's content. We present a dataset construction method suitable for style personalized generation tasks of this kind, enabling the trained model to generate stylized images adaptively instead of re-training for each style. We also present an evaluation metric, StySim, to measure the style similarity between two images, and this metric shows that the style personalization capability of SGDM is the best. And metrics such as FID, KID, and CLIPSIM indicate that SGDM maintains good performance in text-to-image generation.
引用
收藏
页码:9804 / 9813
页数:10
相关论文
共 50 条
  • [31] JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
    Zeng, Yu
    Patel, Vishal M.
    Wang, Haochen
    Huang, Xun
    Wang, Ting-Chun
    Liu, Ming-Yu
    Balaji, Yogesh
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6786 - 6795
  • [32] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
    Nichol, Alex
    Dhariwal, Prafulla
    Ramesh, Aditya
    Shyam, Pranav
    Mishkin, Pamela
    McGrew, Bob
    Sutskever, Ilya
    Chen, Mark
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [33] Prior knowledge guided text to image generation
    Liu, An-An
    Sun, Zefang
    Xu, Ning
    Kang, Rongbao
    Cao, Jinbo
    Yang, Fan
    Qin, Weijun
    Zhang, Shenyuan
    Zhang, Jiaqi
    Li, Xuanya
    PATTERN RECOGNITION LETTERS, 2024, 177 : 89 - 95
  • [34] Adaptive prompt guided unified image restoration with latent diffusion model
    Lv, Xiang
    Shao, Mingwen
    Wan, Yecong
    Qiao, Yuanjian
    Wang, Changzhong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 146
  • [35] SMFS-GAN: Style-Guided Multi-class Freehand Sketch-to-Image Synthesis
    Cheng, Zhenwei
    Wu, Lei
    Li, Xiang
    Meng, Xiangxu
    COMPUTER GRAPHICS FORUM, 2024, 43 (06)
  • [36] StyleDrop: Text-to-Image Generation in Any Style
    Sohn, Kihyuk
    Ruiz, Nataniel
    Lee, Kimin
    Chin, Daniel Castro
    Blok, Irina
    Chang, Huiwen
    Barber, Jarred
    Jiang, Lu
    Entis, Glenn
    Li, Yuanzhen
    Hao, Yuan
    Essa, Irfan
    Rubinstein, Michael
    Krishnan, Dilip
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Generative adversarial text-to-image generation with style image constraint
    Wang, Zekang
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303
  • [38] Generative adversarial text-to-image generation with style image constraint
    Zekang Wang
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Multimedia Systems, 2023, 29 : 3291 - 3303
  • [39] Conditional Text Image Generation with Diffusion Models
    Zhu, Yuanzhi
    Li, Zhaohai
    Wang, Tianwei
    He, Mengchao
    Yao, Cong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14235 - 14245
  • [40] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166