MirrorGAN: Learning Text-to-image Generation by Redescription

被引:372
作者
Qiao, Tingting [1 ,3 ]
Zhang, Jing [2 ,3 ]
Xu, Duanqing [1 ]
Tao, Dacheng [3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Automat, Hangzhou, Peoples R China
[3] Univ Sydney, FEIT, Sch Comp Sci, UBTECH Sydney AI Ctr, Sydney, NSW, Australia
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
D O I
10.1109/CVPR.2019.00160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating an image from a given text description has two goals: visual realism and semantic consistency. Although significant progress has been made in generating high-quality and visually realistic images using generative adversarial networks, guaranteeing semantic consistency between the text description and visual content remains very challenging. In this paper, we address this problem by proposing a novel global-local attentive and semantic preserving text-to-image-to-text framework called MirrorGAN. MirrorGAN exploits the idea of learning text-to-image generation by redescription and consists of three modules: a semantic text embedding module (STEM), a global-local collaborative attentive module for cascaded image generation (GLAM), and a semantic text regeneration and alignment module (STREAM). STEM generates word- and sentence-level embeddings. GLAM has a cascaded architecture for generating target images from coarse to fine scales, leveraging both local word attention and global sentence attention to progressively enhance the diversity and semantic consistency of the generated images. STREAM seeks to regenerate the text description from the generated image, which semantically aligns with the given text description. Thorough experiments on two public benchmark datasets demonstrate the superiority of Mirror-GAN over other representative state-of-the-art methods.
引用
收藏
页码:1505 / 1514
页数:10
相关论文
共 43 条
[1]  
Almahairi A, 2018, PR MACH LEARN RES, V80
[2]  
[Anonymous], 2015, P C EMP METH NAT LAN
[3]  
[Anonymous], 1997, IEEE T SIGNAL PROCES
[4]  
[Anonymous], 2016, ADV NEURAL INFORM PR
[5]  
[Anonymous], 2011, Technical Report CNS-TR-2011-001
[6]  
[Anonymous], IEEE C COMP VIS PATT
[7]  
[Anonymous], 2015, INT C MACH LEARN ICM
[8]  
[Anonymous], IEEE WINT C APPL COM
[9]  
[Anonymous], 2018, IEEE C COMP VIS PATT
[10]  
[Anonymous], IEE C COMP VIS PATT