SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation

被引:5
作者
Xu, Yifei [1 ]
Xu, Xiaolong [2 ]
Gao, Honghao [3 ,4 ]
Xiao, Fu [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[4] Gachon Univ, Coll Future Ind, Gyeonggi 461701, South Korea
基金
中国国家自然科学基金;
关键词
Feature extraction; Adaptation models; Image synthesis; Computational modeling; Training; Task analysis; Noise reduction; Personalized image generation; text-to-image generation; style personalization; diffusion model; image style similarity assessment;
D O I
10.1109/TMM.2024.3399075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The existing personalized text-to-image generation models face issues such as repeated training and insufficient generalization capabilities. We present an adaptive Style-Guided Diffusion Model (SGDM). When provided with a set of stylistically consistent images and prompts as inputs, SGDM can generate images that align with the prompts while maintaining style consistency with the input images. SGDM first extracts features from the input style image and then combines style features from different depths. Last, style features are injected into the noise generation process of the original Stable Diffusion (SD) model by the style-guided module we propose. This strategy fully leverages the generative and generalization capabilities of the pre-trained text-to-image model to ensure the accuracy of the generated image's content. We present a dataset construction method suitable for style personalized generation tasks of this kind, enabling the trained model to generate stylized images adaptively instead of re-training for each style. We also present an evaluation metric, StySim, to measure the style similarity between two images, and this metric shows that the style personalization capability of SGDM is the best. And metrics such as FID, KID, and CLIPSIM indicate that SGDM maintains good performance in text-to-image generation.
引用
收藏
页码:9804 / 9813
页数:10
相关论文
共 52 条
  • [1] Abdal Rameen, 2022, SIGGRAPH22 Conference Proceeding: Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings, DOI 10.1145/3528233.3530747
  • [2] Artwork Personalization at Netflix
    Amat, Fernando
    Chandrashekar, Ashok
    Jebara, Tony
    Basilico, Justin
    [J]. 12TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS), 2018, : 487 - 488
  • [3] Blended Diffusion for Text-driven Editing of Natural Images
    Avrahami, Omri
    Lischinski, Dani
    Fried, Ohad
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18187 - 18197
  • [4] Text2LIVE: Text-Driven Layered Image and Video Editing
    Bar-Tal, Omer
    Ofri-Amar, Dolev
    Fridman, Rafail
    Kasten, Yoni
    Dekel, Tali
    [J]. COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 707 - 723
  • [5] Personalized recommender system for e-Learning environment
    Benhamdi S.
    Babouri A.
    Chiky R.
    [J]. Education and Information Technologies, 2017, 22 (4) : 1455 - 1477
  • [6] Binkowski M., 2023, P INT C LEARN REPR, P1
  • [7] InstructPix2Pix: Learning to Follow Image Editing Instructions
    Brooks, Tim
    Holynski, Aleksander
    Efros, Alexei A.
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18392 - 18402
  • [8] A viable framework for semi-supervised learning on realistic dataset
    Chang, Hao
    Xie, Guochen
    Yu, Jun
    Ling, Qiang
    Gao, Fang
    Yu, Ye
    [J]. MACHINE LEARNING, 2023, 112 (06) : 1847 - 1869
  • [9] Chang Huiwen, P MACHINE LEARNING R
  • [10] ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors
    Chen, Jingwen
    Pan, Yingwei
    Yao, Ting
    Mei, Tao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7540 - 7548