Locally Conditioned GANs: Self-Supervised Local Patch Representation Learning for Conditional Generation

被引：0

作者：

Kim, Dongseob ^{[1
]}

Shim, Hyunjung ^{[2
]}

机构：

[1] Yonsei Univ, Sch Integrated Technol, Incheon 21983, South Korea

[2] Korea Adv Inst Sci & Technol KAIST, Kim Jaechul Grad Sch Artificial Intelligence, Seoul 02455, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Training; Image synthesis; Generative adversarial networks; Semantics; Vectors; Representation learning; Image reconstruction; Condition monitoring; Generative adversarial network; conditional generation; image composition; IMAGE GENERATION; OPTIMIZATION;

D O I：

10.1109/ACCESS.2024.3418884

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Existing conditional generation models using generative adversarial networks (GANs) suffer from two common limitations: 1) they heavily rely on supervision, or 2) their performance is favorable to the scenario of creating only small changes. This study aims to address both issues by introducing new locally conditioned generative adversarial networks (LCGAN). Inspired by self-supervised representation learning, we devise intuitive learning signals and training tactics to learn the local patch encoding for developing the locally controllable latent space of GANs. Powered by local patch encoding with our novel loss design, the proposed model successfully performs locally conditioned image generation while covering various attributes. Utilizing LCGAN, ordinary users can easily design an image by browsing its patch-level appearance from various patch examples, even including out-of-domain examples. Besides, LCGAN, with latent optimization, offers high-quality results in local editing. Experimental evaluations verify that our model is effective in both conditional generation and local editing in achieving both image quality and fidelity. Our method is the most preferred by 55.78% of user study participants, and it achieved Fr & eacute;chet inception distance scores of 16.24 and 15.01 on the FFHQ and AFHQ-cat datasets, respectively. Especially, a comprehensive user study supports that: 1) trade-off between quality and fidelity exists in existing methods and 2) our model is the first to alleviate their trade-off relationships, showing the potential in practical image editing applications.

引用

页码：134115 / 134132

页数：18

共 86 条

[1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2] Abdal R, 2020, PROC CVPR IEEE, P8293, DOI 10.1109/CVPR42600.2020.00832
[3] Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?
Abdal, Rameen
Qin, Yipeng
Wonka, Peter
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4431 - 4440
[4] Blended Diffusion for Text-driven Editing of Natural Images
Avrahami, Omri
Lischinski, Dani
Fried, Ohad
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18187 - 18197
[5] Discriminator Feature-Based Inference by Recycling the Discriminator of GANs
Bang, Duhyeon
Kang, Seoungyoon
Shim, Hyunjung
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (10-11) : 2436 - 2458
[6] Besserve M., 2020, P INT C LEARN REPR
[7] Brock Andrew, 2019, ICLR
[8] Chai L., 2021, P INT C LEARN REPR
[9] Locally GAN-generated face detection based on an improved Xception
Chen, Beijing
Ju, Xingwang
Xiao, Bin
Ding, Weiping
Zheng, Yuhui
de Albuquerque, Victor Hugo C.
[J]. INFORMATION SCIENCES, 2021, 572 : 16 - 28
[10] Chen TC, 2009, PROC EUR SOLID-STATE, P1

← 1 2 3 4 5 6 7 8 9 →