LisaCLIP: Locally Incremental Semantics Adaptation towards Zero-shot Text-driven Image Synthesis

被引:2
作者
Cao, An [1 ]
Zhou, Yilin [1 ]
Shen, Gang [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
关键词
image synthesis; style transfer; CLIP model; adaptive patch selection;
D O I
10.1109/IJCNN54540.2023.10191516
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automatic transfer of a plain photo into a desired synthetic style has attracted numerous users in the application fields of photo editing, visual art, and entertainment. By connecting images and texts, the Contrastive Language-Image Pre-Training (CLIP) model facilitates the text-driven style transfer without exploring the image's latent domain. However, the trade-off between content fidelity and stylization remains challenging. In this paper, we present LisaCLIP, a CLIP-based image synthesis framework that only exploits the CLIP model to guide the imagery manipulations with a depth-adaptive encoder-decoder network. Since an image patch's semantics depend on its size, LisaCLIP progressively downsizes the patches while adaptively selecting the most significant ones for further stylization. We introduce a multi-stage training strategy to speed up LisaCLIP's convergence by decoupling the optimization objectives. Various experiments on public datasets demonstrated that LisaCLIP supported a wide range of style transfer tasks and outperformed other state-of-the-art methods in maintaining the balance between content and style.
引用
收藏
页数:10
相关论文
共 26 条
[1]   NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study [J].
Agustsson, Eirikur ;
Timofte, Radu .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1122-1131
[2]  
Dhariwal P, 2021, ADV NEUR IN, V34
[3]  
Ding Ming, 2022, ARXIV220414217
[4]  
Dosovitskiy A., 2020, ICLR 2021
[5]   StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators [J].
Gal, Rinon ;
Patashnik, Or ;
Maron, Haggai ;
Bermano, Amit H. ;
Chechik, Gal ;
Cohen-Or, Daniel .
ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (04)
[6]  
Gatys L.A., 2015, A neural algorithm of artistic style, DOI DOI 10.1167/16.12.326
[7]  
Heusel M, 2017, ADV NEUR IN, V30
[8]   Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [J].
Huang, Xun ;
Belongie, Serge .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1510-1519
[9]   Adaptive Fuzzy Output Feedback Fault-Tolerant Compensation for Uncertain Nonlinear Systems With Infinite Number of Time-Varying Actuator Failures and Full-State Constraints [J].
Jing, Yan-Hui ;
Yang, Guang-Hong .
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (02) :568-578
[10]   A Style-Based Generator Architecture for Generative Adversarial Networks [J].
Karras, Tero ;
Laine, Samuli ;
Aila, Timo .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4396-4405