StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

被引：276

作者：

Gal, Rinon ^{[1
,2
]}

Patashnik, Or ^{[1
]}

Maron, Haggai ^{[2
]}

Bermano, Amit H. ^{[1
]}

Chechik, Gal ^{[2
]}

Cohen-Or, Daniel ^{[1
]}

机构：

[1] Tel Aviv Univ, Tel Aviv, Israel

[2] NVIDIA, Tel Aviv, Israel

来源：

ACM TRANSACTIONS ON GRAPHICS | 2022年 / 41卷 / 04期

关键词：

Generator Domain Adaptation; Text-Guided Content Generation; Zero-Shot Training;

D O I：

10.1145/3528223.3530164

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Can a generative model be trained to produce images from a specific domain, guided only by a text prompt, without seeing any image? In other words: can an image generator be trained "blindly"? Leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models, we present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image. We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains characterized by diverse styles and shapes. Notably, many of these modifications would be difficult or infeasible to reach with existing methods. We conduct an extensive set of experiments across a wide range of domains. These demonstrate the effectiveness of our approach, and show that our models preserve the latent-space structure that makes generative models appealing for downstream tasks. Code and videos available at: stylegan-nada.github.io/

引用

页数：13

共 68 条

[1] StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows [J].

Abdal, Rameen ;

Zhu, Peihao ;

Mitra, Niloy J. ;

Wonka, Peter .

ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (03)

[2] Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? [J].

Abdal, Rameen ;

Qin, Yipeng ;

Wonka, Peter .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4431-4440

[3]

Alaluf Y, 2021, Arxiv, DOI [arXiv:2102.02754, DOI 10.48550/ARXIV.2102.02754]

[4]

Alaluf Yuval, 2021, arXiv

[5]

[Anonymous], 1990, PARTITIONING MEDOIDS, P68, DOI [10.1002/9780470316801.ch2arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470316801.ch2, DOI 10.1002/9780470316801.CH2ARXIV:HTTPS://ONLINELIBRARY.WILEY.COM/DOI/PDF/10.1002/9780470316801.CH2]

[6]

Bau D, 2021, Arxiv, DOI arXiv:2103.10951

[7]

Brock A, 2019, Arxiv, DOI arXiv:1809.11096

[8]

Sariyildiz MB, 2020, Arxiv, DOI arXiv:2008.01392

[9]

Chen Yen-Chun, 2020, ECCV

[10] StarGAN v2: Diverse Image Synthesis for Multiple Domains [J].

Choi, Yunjey ;

Uh, Youngjung ;

Yoo, Jaejun ;

Ha, Jung-Woo .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8185-8194

← 1 2 3 4 5 6 7 →