ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

被引：12

作者：

Chen, Jingwen ^{[1
]}

Pan, Yingwei ^{[2
]}

Yao, Ting ^{[3
]}

Mei, Tao ^{[3
]}

机构：

[1] Sun Yat Sen Univ, Guangzhou, Peoples R China

[2] Univ Sci & Technol China, Hefei, Peoples R China

[3] HiDream Ai Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

diffusion models; text-to-image generation; style transfer;

D O I：

10.1145/3581783.3612524

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, the multimedia community has witnessed the rise of diffusion models trained on large-scale multi-modal data for visual content creation, particularly in the field of text-to-image generation. In this paper, we propose a new task for "stylizing" text-to-image models, namely text-driven stylized image generation, that further enhances editability in content creation. Given input text prompt and style image, this task aims to produce stylized images which are both semantically relevant to input text prompt and meanwhile aligned with the style image in style. To achieve this, we present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network enabling more conditions of text prompts and style images. Moreover, diffusion style and content regularizations are simultaneously introduced to facilitate the learning of this modulation network with these diffusion priors, pursuing high-quality stylized text-to-image generation. Extensive experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results, surpassing a simple combination of text-to-image model and conventional style transfer techniques.

引用

页码：7540 / 7548

页数：9

共 47 条

[1] Bai Yunpeng, 2023, ABS230110916 CORR, DOI DOI 10.48550/ARXIV.2301.10916
[2] Balaji Yogesh, 2022, ABS221101324 CORR, DOI [10.48550/arXiv.2211.01324, DOI 10.48550/ARXIV.2211.01324]
[3] Brooks Tim, 2022, ARXIV221109800
[4] Chen Yang, 2019, ACM MM
[5] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Cheng, Kun
Cun, Xiaodong
Zhang, Yong
Xia, Menghan
Yin, Fei
Zhu, Mingrui
Wang, Xuan
Wang, Jue
Wang, Nannan
[J]. PROCEEDINGS SIGGRAPH ASIA 2022, 2022,
[6] Dhariwal Prafulla, 2021, NEURIPS
[7] Fu Tsu-Jui, 2022, ECCV
[8] Gatys LA., 2015, A neural algorithm of artistic style, V16, P326, DOI DOI 10.1167/16.12.326
[9] Controlling Perceptual Factors in Neural Style Transfer
Gatys, Leon A.
Ecker, Alexander S.
Bethge, Matthias
Hertzmann, Aaron
Shechtman, Eli
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3730 - 3738
[10] Generative Adversarial Networks
Goodfellow, Ian
Pouget-Abadie, Jean
Mirza, Mehdi
Xu, Bing
Warde-Farley, David
Ozair, Sherjil
Courville, Aaron
Bengio, Yoshua
[J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144

← 1 2 3 4 5 →