CYCLE-CONSISTENT DIVERSE IMAGE SYNTHESIS FROM NATURAL LANGUAGE

被引:8
作者
Chen, Zhi [1 ]
Luo, Yadan [1 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
来源
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW) | 2019年
关键词
Image synthesis; image captioning; generative adversarial networks; cycle-consistency loss;
D O I
10.1109/ICMEW.2019.00085
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Text-to-image translation has become an attractive yet challenging task in computer vision. Previous approaches tend to generate similar, or even monotonous, images for distinctive texts and overlook the characteristics of specific sentences. In this paper, we aim to generate images from the given texts by preserving diverse appearances and modes of the objects or instances contained. To achieve that, a novel learning model named SuperGAN is proposed, which consists of two major components: an image synthesis network and a captioning model in a Cycle-GAN framework. SuperGAN adopts the cycle-consistent adversarial training strategy to learn an image generator where the feature distribution of the generated images complies with the distribution of the generic images. Meanwhile, a cycle-consistency loss is applied to constrain that the caption of the generated images is closed to the original texts. Extensive experiments on the benchmark dataset Oxford-flowers-102 demonstrate the validity and effectiveness of our proposed method. In addition, a new evaluation metric is proposed to measure the diversity of synthetic results.
引用
收藏
页码:459 / 464
页数:6
相关论文
共 17 条
[1]  
[Anonymous], 2017, ARXIV171010916
[2]  
[Anonymous], 2014, ARXIV PREPRINT ARXIV
[3]  
[Anonymous], 2016, ADV NEURAL INFORM PR
[4]  
Cheung V., 2016, P NIPS, P2234
[5]  
Denton E, 2015, ADV NEUR IN, V28
[6]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[7]  
Hoffman J., 2017, ARXIV171103213
[8]  
Jin DK, 2017, COMPUT VIS PATT REC, P151, DOI 10.1016/B978-0-08-101291-8.00007-9
[9]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[10]  
Mao J., 2014, ARXIV14126632