Modified GAN with Proposed Feature Set for Text-to-Image Synthesis

被引：1

作者：

Talasila, Vamsidhar ^{[1
]}

Narasingarao, M. R. ^{[2
]}

Mohan, V. Murali ^{[1
]}

机构：

[1] Deemed Univ, Koneru Lakshmaiah Educ Fdn, Dept Comp Sci & Engn, Guntur 522302, Andhra Prades, India

[2] GITAM Deemed Univ, Dept Comp Sci & Engn, Visakhapatnam Campus, Visakhapatnam 530045, Andhra Prades, India

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2023年 / 37卷 / 04期

关键词：

Bi-LSTM; optimal GAN; proposed TF-IDF feature; SI-SSD scheme; text-to-image;

D O I：

10.1142/S0218001423540046

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automated synthesis of practical images from the text could be useful and interesting; however, present AI systems are yet far from this objective. Nevertheless, in current years, powerful and generic Recurrent Neural Network (RNN) structures were introduced to train discriminative text feature representation. In the meantime, Deep Convolutional GANs have started producing highly convincing images of specified categories, like room interiors, album covers, and faces. In this research work, we plan to develop a new model for text-to-image synthesis, which contains three important phases: (i) feature extraction, (ii) text encoding, and (iii) optimal image synthesis. Initially, the text features such as improved TF-IDF, bag of words, and N-gram are extracted from the text and they are trained by Bi-LSTM. During the encoding of an image from text, cross-modal feature grouping is performed. Further, the image is synthesized using modified GAN (MGAN) with a new loss function. Here, for precise synthesis of images, the weights of GAN are optimized using Self-improved Social Ski-Driver (SI-SSD) optimization algorithm. Eventually, the superiority of the suggested model is examined via an assessment over existing schemes.

引用

页数：24

共 46 条

[1] A supervised deep convolutional based bidirectional long short term memory video hashing for large scale video retrieval applications [J].

Anuranji, R. ;

Srimathi, H. .

DIGITAL SIGNAL PROCESSING, 2020, 102

[2] Retro-Remote Sensing: Generating Images From Ancient Texts [J].

Bejiga, Mesay Belete ;

Melgani, Farid ;

Vascotto, Antonio .

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (03) :950-960

[3]

Bowles J., 2019, N GRAM MODELS 1

[4] Person image synthesis through siamese generative adversarial network [J].

Chen Y. ;

Xia S. ;

Zhao J. ;

Jian M. ;

Zhou Y. ;

Niu Q. ;

Yao R. ;

Zhu D. .

Neurocomputing, 2020, 417 :490-500

[5] An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset [J].

Cheng, Keyang ;

Tahir, Rabia ;

Eric, Lubamba Kasangu ;

Li, Maozhen .

MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (19-20) :13725-13752

[6] Cross-modal Feature Alignment based Hybrid Attentional Generative Adversarial Networks for text-to-image synthesis [J].

Cheng, Qingrong ;

Gu, Xiaodong .

DIGITAL SIGNAL PROCESSING, 2020, 107

[7] Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval [J].

Cheng, Qingrong ;

Gu, Xiaodong .

MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (41-42) :31401-31428

[8]

Dong Y., 2020, PATTERN RECOGN, V110

[9] A Comprehensive Pipeline for Complex Text-to-Image Synthesis [J].

Fang, Fei ;

Luo, Fei ;

Zhang, Hong-Pan ;

Zhou, Hua-Jian ;

Chow, Alix L. H. ;

Xiao, Chun-Xia .

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (03) :522-537

[10] Generating Text Sequence Images for Recognition [J].

Gong, Yanxiang ;

Deng, Linjie ;

Ma, Zheng ;

Xie, Mei .

NEURAL PROCESSING LETTERS, 2020, 51 (02) :1677-1688

← 1 2 3 4 5 →