ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

被引:12
作者
Chen, Jingwen [1 ]
Pan, Yingwei [2 ]
Yao, Ting [3 ]
Mei, Tao [3 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] HiDream Ai Inc, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
diffusion models; text-to-image generation; style transfer;
D O I
10.1145/3581783.3612524
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the multimedia community has witnessed the rise of diffusion models trained on large-scale multi-modal data for visual content creation, particularly in the field of text-to-image generation. In this paper, we propose a new task for "stylizing" text-to-image models, namely text-driven stylized image generation, that further enhances editability in content creation. Given input text prompt and style image, this task aims to produce stylized images which are both semantically relevant to input text prompt and meanwhile aligned with the style image in style. To achieve this, we present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network enabling more conditions of text prompts and style images. Moreover, diffusion style and content regularizations are simultaneously introduced to facilitate the learning of this modulation network with these diffusion priors, pursuing high-quality stylized text-to-image generation. Extensive experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results, surpassing a simple combination of text-to-image model and conventional style transfer techniques.
引用
收藏
页码:7540 / 7548
页数:9
相关论文
共 47 条
  • [1] Bai Yunpeng, 2023, ABS230110916 CORR, DOI DOI 10.48550/ARXIV.2301.10916
  • [2] Balaji Yogesh, 2022, ABS221101324 CORR, DOI [10.48550/arXiv.2211.01324, DOI 10.48550/ARXIV.2211.01324]
  • [3] Brooks Tim, 2022, ARXIV221109800
  • [4] Chen Yang, 2019, ACM MM
  • [5] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
    Cheng, Kun
    Cun, Xiaodong
    Zhang, Yong
    Xia, Menghan
    Yin, Fei
    Zhu, Mingrui
    Wang, Xuan
    Wang, Jue
    Wang, Nannan
    [J]. PROCEEDINGS SIGGRAPH ASIA 2022, 2022,
  • [6] Dhariwal Prafulla, 2021, NEURIPS
  • [7] Fu Tsu-Jui, 2022, ECCV
  • [8] Gatys LA., 2015, A neural algorithm of artistic style, V16, P326, DOI DOI 10.1167/16.12.326
  • [9] Controlling Perceptual Factors in Neural Style Transfer
    Gatys, Leon A.
    Ecker, Alexander S.
    Bethge, Matthias
    Hertzmann, Aaron
    Shechtman, Eli
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3730 - 3738
  • [10] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144