SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation

被引:4
|
作者
Xu, Yifei [1 ]
Xu, Xiaolong [2 ]
Gao, Honghao [3 ,4 ]
Xiao, Fu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[4] Gachon Univ, Coll Future Ind, Gyeonggi 461701, South Korea
基金
中国国家自然科学基金;
关键词
Feature extraction; Adaptation models; Image synthesis; Computational modeling; Training; Task analysis; Noise reduction; Personalized image generation; text-to-image generation; style personalization; diffusion model; image style similarity assessment;
D O I
10.1109/TMM.2024.3399075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The existing personalized text-to-image generation models face issues such as repeated training and insufficient generalization capabilities. We present an adaptive Style-Guided Diffusion Model (SGDM). When provided with a set of stylistically consistent images and prompts as inputs, SGDM can generate images that align with the prompts while maintaining style consistency with the input images. SGDM first extracts features from the input style image and then combines style features from different depths. Last, style features are injected into the noise generation process of the original Stable Diffusion (SD) model by the style-guided module we propose. This strategy fully leverages the generative and generalization capabilities of the pre-trained text-to-image model to ensure the accuracy of the generated image's content. We present a dataset construction method suitable for style personalized generation tasks of this kind, enabling the trained model to generate stylized images adaptively instead of re-training for each style. We also present an evaluation metric, StySim, to measure the style similarity between two images, and this metric shows that the style personalization capability of SGDM is the best. And metrics such as FID, KID, and CLIPSIM indicate that SGDM maintains good performance in text-to-image generation.
引用
收藏
页码:9804 / 9813
页数:10
相关论文
共 50 条
  • [1] SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback
    Chen, Yanzhe
    Zhou, Jiahuan
    Peng, Yuxin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)
  • [2] OmniStyleGAN for Style-Guided Image-to-Image Translation
    Zhao, Qianyi
    Wang, Mengyin
    Zhang, Qing
    Wang, Fasheng
    Sun, Fuming
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI, 2025, 15041 : 351 - 365
  • [3] Stylized Story Generation with Style-Guided Planning
    Kong, Xiangzhe
    Huang, Jialiang
    Tung, Ziquan
    Guan, Jian
    Huang, Minlie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2430 - 2436
  • [4] Style-Guided Image-to-Image Translation for Multiple Domains
    Li, Tingting
    Zhao, Huan
    Wang, Song
    Huang, Jing
    MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 28 - 36
  • [5] Efficient image restoration with style-guided context cluster and interaction
    Fengjuan Qiao
    Yonggui Zhu
    Ming Meng
    Neural Computing and Applications, 2024, 36 : 6973 - 6991
  • [6] Style-Guided and Disentangled Representation for Robust Image-to-Image Translation
    Choi, Jaewoong
    Kim, Daeha
    Song, Byung Cheol
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 463 - 471
  • [7] Efficient image restoration with style-guided context cluster and interaction
    Qiao, Fengjuan
    Zhu, Yonggui
    Meng, Ming
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (13): : 6973 - 6991
  • [8] StyleGuide: Zero-Shot Sketch-Based Image Retrieval Using Style-Guided Image Generation
    Dutta, Titir
    Singh, Anurag
    Biswas, Soma
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2833 - 2842
  • [9] Style-Guided Inference of Transformer for High-resolution Image Synthesis
    Yim, Jonghwa
    Kim, Minjae
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1745 - 1755
  • [10] SIAN: STYLE-GUIDED INSTANCE-ADAPTIVE NORMALIZATION FOR MULTI-ORGAN HISTOPATHOLOGY IMAGE SYNTHESIS
    Wang, Haotian
    Xian, Min
    Vakanski, Aleksandar
    Shareef, Bryar
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,