Peer Is Your Pillar: A Data-Unbalanced Conditional GANs for Few-Shot Image Generation

被引：1

作者：

Li, Ziqiang ^{[1
,2
]}

Wang, Chaoyue ^{[3
]}

Rui, Xue ^{[4
]}

Xue, Chao ^{[3
]}

Leng, Jiaxu ^{[5
]}

Fu, Zhangjie ^{[1
]}

Li, Bin ^{[6
,7
]}

机构：

[1] Nanjing Univ Informat Sci & Technol, Engn Res Ctr Digital Forens, Nanjing 211544, Peoples R China

[2] Univ Sci & Technol China, Big Data & Decis Lab, Hefei 230026, Peoples R China

[3] JD Explore Acad, Beijing 102600, Peoples R China

[4] Nanjing Univ Informat Sci & Technol, Sch Emergency Management, Nanjing 211544, Peoples R China

[5] Chongqing Univ Posts & Telecommun, Sch Comp Sci, Chongqing, Peoples R China

[6] Univ Sci & Technol China, Lab Big Data & Decis, Hefei 230026, Peoples R China

[7] CAS Key Lab Technol Geospatial Informat Proc & App, Hefei 230026, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Training; Image synthesis; Training data; Generators; Pipelines; Generative adversarial networks; Semantics; Transfer learning; Data augmentation; Circuits and systems; Few-shot image generation; generative adversarial networks; image synthesis;

D O I：

10.1109/TCSVT.2024.3485109

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Few-shot image generation aims to train generative models using a small number of training images. When there are few images available for training (e.g. 10 images), Learning From Scratch (LFS) methods often generate images that closely resemble the training data while Transfer Learning (TL) methods try to improve performance by leveraging prior knowledge from GANs pre-trained on large-scale datasets. However, current TL methods may not allow for sufficient control over the degree of knowledge preservation from the source model, making them unsuitable for setups where the source and target domains are not closely related. To address this, we propose a novel pipeline called Peer is your Pillar (PIP), which combines a target few-shot dataset with a peer dataset to create a data-unbalanced conditional generation. Our approach includes a class embedding method that separates the class space from the latent space, and we use a direction loss based on pre-trained CLIP to improve image diversity. Experiments on various few-shot datasets demonstrate the advancement of the proposed PIP, especially reduces the training requirements of few-shot image generation.

引用

页码：1303 / 1317

页数：15

共 59 条

[1] A Neural Space-Time Representation for Text-to-Image Personalization
Alaluf, Yuval
Richardson, Elad
Metzer, Gal
Cohen-Or, Daniel
[J]. ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (06):
[2] Alanov Aibek, 2022, ADV NEUR IN
[3] Brock A, 2019, Arxiv, DOI [arXiv:1809.11096, 10.48550/arXiv.1809.11096]
[4] Casanova A, 2021, ADV NEUR IN, V34
[5] Mitigating Label Noise in GANs via Enhanced Spectral Normalization
Chen, Yuanqi
Jin, Cece
Li, Ge
Li, Thomas H. H.
Gao, Wei
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 3924 - 3934
[6] Gal R, 2021, Arxiv, DOI arXiv:2108.00946
[7] Goodfellow I., 2014, ADV NEURAL INFORM PR, P2672, DOI DOI 10.1145/3422622
[8] Gulrajani I., 2017, IMPROVED TRAINING WA, DOI DOI 10.48550/ARXIV.1704.00028
[9] Few-Shot Font Generation by Learning Style Difference and Similarity
He, Xiao
Zhu, Mingrui
Wang, Nannan
Gao, Xinbo
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8013 - 8025
[10] Heusel Martin, 2017, arXiv, DOI DOI 10.48550/ARXIV.1706.08500

← 1 2 3 4 5 6 →