Text to Image GANs with RoBERTa and Fine-grained Attention Networks

被引:0
作者
Siddharth, M. [1 ]
Aarthi, R. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Dept Comp Sci & Engn, Amrita Sch Engn, Coimbatore, Tamil Nadu, India
关键词
Natural language processing; computer vision; GANs; AttnGAN; RoBERTa;
D O I
10.14569/IJACSA.2021.01212115
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Synthesizing new images from textual descriptions requires understanding the context of the text. It is a very challenging problem in Natural Language Processing and Computer vision. Existing systems use Generative Adversarial Network (GAN) to generate images using a simple text encoder from their captions. This paper consist synthesizing images from textual descriptions using Caltech-UCSD birds datasets by baselining the generative model using Attentional Generative Adversarial Networks (AttnGAN) and using RoBERTa pre-trained neural language model for word embeddings. The results obtained are compared with the baseline AttnGAN model and conduct various analyses on incorporating RoBERTa text encoder concerning simple encoder in the existing system. Various performance improvements were noted compared to baseline Attention Generative networks. The FID score has decreased from 23.98 in AttnGAN to 20.77 with incorporation of RoBERTa model with AttnGAN.
引用
收藏
页码:947 / 955
页数:9
相关论文
共 17 条
[1]  
Aloysius N, 2017, 2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), P588, DOI 10.1109/ICCSP.2017.8286426
[2]  
[Anonymous], 2018, INT J PURE APPL MATH
[3]  
Brunda R., 2018, INT J ENG TECHNOLOGY, V7, P3131, DOI 10.14419/ijet.v7i4.18445
[4]   Effectively Unbiased FID and Inception Score and where to find them [J].
Chong, Min Jin ;
Forsyth, David .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6069-6078
[5]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[6]  
Li B, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P1991
[7]   RoBERTa: A Robustly Optimized BERT Pretraining Approach [J].
Liu, Yinhan ;
Ott, Myle ;
Goyal, Naman ;
Du, Jingfei ;
Joshi, Mandar ;
Chen, Danqi ;
Levy, Omer ;
Lewis, Mike ;
Zettlemoyer, Luke ;
Stoyanov, Veselin .
INFORMATION SYSTEMS RESEARCH, 2019,
[8]   MirrorGAN: Learning Text-to-image Generation by Redescription [J].
Qiao, Tingting ;
Zhang, Jing ;
Xu, Duanqing ;
Tao, Dacheng .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1505-1514
[9]  
Reed S.E., 2016, ADV NEURAL INFORM PR, V29, P217
[10]  
Reed S, 2016, PR MACH LEARN RES, V48