Text to Image GANs with RoBERTa and Fine-grained Attention Networks

被引：0

作者：

Siddharth, M. ^{[1
]}

Aarthi, R. ^{[1
]}

机构：

[1] Amrita Vishwa Vidyapeetham, Dept Comp Sci & Engn, Amrita Sch Engn, Coimbatore, Tamil Nadu, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2021年 / 12卷 / 12期

关键词：

Natural language processing; computer vision; GANs; AttnGAN; RoBERTa;

D O I：

10.14569/IJACSA.2021.01212115

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Synthesizing new images from textual descriptions requires understanding the context of the text. It is a very challenging problem in Natural Language Processing and Computer vision. Existing systems use Generative Adversarial Network (GAN) to generate images using a simple text encoder from their captions. This paper consist synthesizing images from textual descriptions using Caltech-UCSD birds datasets by baselining the generative model using Attentional Generative Adversarial Networks (AttnGAN) and using RoBERTa pre-trained neural language model for word embeddings. The results obtained are compared with the baseline AttnGAN model and conduct various analyses on incorporating RoBERTa text encoder concerning simple encoder in the existing system. Various performance improvements were noted compared to baseline Attention Generative networks. The FID score has decreased from 23.98 in AttnGAN to 20.77 with incorporation of RoBERTa model with AttnGAN.

引用

页码：947 / 955

页数：9

共 17 条

[1]

Aloysius N, 2017, 2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), P588, DOI 10.1109/ICCSP.2017.8286426

[2]

[Anonymous], 2018, INT J PURE APPL MATH

[3]

Brunda R., 2018, INT J ENG TECHNOLOGY, V7, P3131, DOI 10.14419/ijet.v7i4.18445

[4] Effectively Unbiased FID and Inception Score and where to find them [J].

Chong, Min Jin ;

Forsyth, David .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6069-6078

[5] Generative Adversarial Networks [J].

Goodfellow, Ian ;

Pouget-Abadie, Jean ;

Mirza, Mehdi ;

Xu, Bing ;

Warde-Farley, David ;

Ozair, Sherjil ;

Courville, Aaron ;

Bengio, Yoshua .

COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144

[6]

Li B, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P1991

[7] RoBERTa: A Robustly Optimized BERT Pretraining Approach [J].

Liu, Yinhan ;

Ott, Myle ;

Goyal, Naman ;

Du, Jingfei ;

Joshi, Mandar ;

Chen, Danqi ;

Levy, Omer ;

Lewis, Mike ;

Zettlemoyer, Luke ;

Stoyanov, Veselin .

INFORMATION SYSTEMS RESEARCH, 2019,

[8] MirrorGAN: Learning Text-to-image Generation by Redescription [J].

Qiao, Tingting ;

Zhang, Jing ;

Xu, Duanqing ;

Tao, Dacheng .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1505-1514

[9]

Reed S.E., 2016, ADV NEURAL INFORM PR, V29, P217

[10]

Reed S, 2016, PR MACH LEARN RES, V48

← 1 2 →