Bridge-GAN: Interpretable Representation Learning for Text-to-Image Synthesis

被引:44
作者
Yuan, Mingkuan [1 ]
Peng, Yuxin [1 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-to-image synthesis; interpretable representation learning; Bridge-GAN;
D O I
10.1109/TCSVT.2019.2953753
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text-to-image synthesis is to generate images with the consistent content as the given text description, which is a highly challenging task with two main issues: visual reality and content consistency. Recently, it is available to generate images with high visual reality due to the significant progress of generative adversarial networks. However, translating text description to image with high content consistency is still ambitious. For addressing the above issues, it is reasonable to establish a transitional space with interpretable representation as a bridge to associate text and image. So we propose a text-to-image synthesis approach named Bridge-like Generative Adversarial Networks (Bridge-GAN). Its main contributions are: (1) A transitional space is established as a bridge for improving content consistency, where the interpretable representation can be learned by guaranteeing the key visual information from given text descriptions. (2) A ternary mutual information objective is designed for optimizing the transitional space and enhancing both the visual reality and content consistency. It is proposed under the goal to disentangle the latent factors conditioned on text description for further interpretable representation learning. Comprehensive experiments on two widely-used datasets verify the effectiveness of our Bridge-GAN with the best performance.
引用
收藏
页码:4258 / 4268
页数:11
相关论文
共 37 条
[1]   Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space [J].
Anh Nguyen ;
Clune, Jeff ;
Bengio, Yoshua ;
Dosovitskiy, Alexey ;
Yosinski, Jason .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3510-3520
[2]  
[Anonymous], 2014, Comput. Sci.
[3]  
Chen X, 2016, ADV NEUR IN, V29
[4]  
Chi J., IEEE T CIRCUITS SYST
[5]  
Dash A, 2017, TAC GAN TEXT CONDITI
[6]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[7]  
Gregor K, 2015, PR MACH LEARN RES, V37, P1462
[8]  
Higgins I., 2017, ICLR (Poster)
[9]   A Style-Based Generator Architecture for Generative Adversarial Networks [J].
Karras, Tero ;
Laine, Samuli ;
Aila, Timo .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4396-4405
[10]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001