LisaCLIP: Locally Incremental Semantics Adaptation towards Zero-shot Text-driven Image Synthesis

被引：2

作者：

Cao, An ^{[1
]}

Zhou, Yilin ^{[1
]}

Shen, Gang ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

关键词：

image synthesis; style transfer; CLIP model; adaptive patch selection;

D O I：

10.1109/IJCNN54540.2023.10191516

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The automatic transfer of a plain photo into a desired synthetic style has attracted numerous users in the application fields of photo editing, visual art, and entertainment. By connecting images and texts, the Contrastive Language-Image Pre-Training (CLIP) model facilitates the text-driven style transfer without exploring the image's latent domain. However, the trade-off between content fidelity and stylization remains challenging. In this paper, we present LisaCLIP, a CLIP-based image synthesis framework that only exploits the CLIP model to guide the imagery manipulations with a depth-adaptive encoder-decoder network. Since an image patch's semantics depend on its size, LisaCLIP progressively downsizes the patches while adaptively selecting the most significant ones for further stylization. We introduce a multi-stage training strategy to speed up LisaCLIP's convergence by decoupling the optimization objectives. Various experiments on public datasets demonstrated that LisaCLIP supported a wide range of style transfer tasks and outperformed other state-of-the-art methods in maintaining the balance between content and style.

引用

页数：10

共 26 条

[1] NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study [J].

Agustsson, Eirikur ;

Timofte, Radu .

2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1122-1131

[2]

Dhariwal P, 2021, ADV NEUR IN, V34

[3]

Ding Ming, 2022, ARXIV220414217

[4]

Dosovitskiy A., 2020, ICLR 2021

[5] StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators [J].

Gal, Rinon ;

Patashnik, Or ;

Maron, Haggai ;

Bermano, Amit H. ;

Chechik, Gal ;

Cohen-Or, Daniel .

ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (04)

[6]

Gatys L.A., 2015, A neural algorithm of artistic style, DOI DOI 10.1167/16.12.326

[7]

Heusel M, 2017, ADV NEUR IN, V30

[8] Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [J].

Huang, Xun ;

Belongie, Serge .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1510-1519

[9] Adaptive Fuzzy Output Feedback Fault-Tolerant Compensation for Uncertain Nonlinear Systems With Infinite Number of Time-Varying Actuator Failures and Full-State Constraints [J].

Jing, Yan-Hui ;

Yang, Guang-Hong .

IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (02) :568-578

[10] A Style-Based Generator Architecture for Generative Adversarial Networks [J].

Karras, Tero ;

Laine, Samuli ;

Aila, Timo .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4396-4405

← 1 2 3 →