STRUCTURE-AWARE GENERATIVE ADVERSARIAL NETWORK FOR TEXT-TO-IMAGE GENERATION

被引：0

作者：

Chen, Wenjie ^{[1
,2
]}

Ni, Zhangkai ^{[1
,2
]}

Wang, Hanli ^{[1
,2
]}

机构：

[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai, Peoples R China

[2] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

基金：

中国国家自然科学基金;

关键词：

Text-to-image generation; generative adversarial network; negative data augmentation;

D O I：

10.1109/ICIP49359.2023.10222100

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image generation aims at synthesizing photo-realistic images from textual descriptions. Existing methods typically align images with the corresponding texts in a joint semantic space. However, the presence of the modality gap in the joint semantic space leads to misalignment. Meanwhile, the limited receptive field of the convolutional neural network leads to structural distortions of generated images. In this work, a structure-aware generative adversarial network (SaGAN) is proposed for (1) semantically aligning multimodal features in the joint semantic space in a learnable manner; and (2) improving the structure and contour of generated images by the designed content-invariant negative samples. Experimental results show that SaGAN achieves over 30.1% and 8.2% improvements in terms of FID on the datasets of CUB and COCO when compared with the state-of-the-art approaches.

引用

页码：2075 / 2079

页数：5

共 21 条

[1]

[Anonymous], 2016, P NEURIPS 16 DEC

[2]

[Anonymous], 2021, P ICML 21 JUL

[3]

[Anonymous], 2016, P ICML 16 JUL

[4] PRINCIPAL WARPS - THIN-PLATE SPLINES AND THE DECOMPOSITION OF DEFORMATIONS [J].

BOOKSTEIN, FL .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (06) :567-585

[5]

Devries T., 2017, ARXIV

[6]

Geirhos Robert., 2019, INT C LEARN REPR ICL

[7]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

[8]

Heusel M, 2017, ADV NEUR IN, V30

[9]

Karras T, 2020, ADV NEUR IN, V33

[10] Analyzing and Improving the Image Quality of StyleGAN [J].

Karras, Tero ;

Laine, Samuli ;

Aittala, Miika ;

Hellsten, Janne ;

Lehtinen, Jaakko ;

Aila, Timo .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8107-8116

← 1 2 3 →