Exploring Progress in Text-to-Image Synthesis: An In-Depth Survey on the Evolution of Generative Adversarial Networks

被引:1
作者
Habib, Md Ahsan [1 ]
Wadud, Md Anwar Hussen [1 ,2 ]
Patwary, Md Fazlul Karim [3 ]
Rahman, Mohammad Motiur [1 ]
Mridha, M. F. [4 ]
Okuyama, Yuichi [5 ]
Shin, Jungpil [5 ]
机构
[1] Mawlana Bhashani Sci & Technol Univ, Dept Comp Sci & Engn, Tangail 1902, Bangladesh
[2] Sunamgonj Sci & Technol Univ, Dept Comp Sci & Engn, Sunamganj 3000, Bangladesh
[3] Jahangirnagar Univ, Inst Informat Technol, Dhaka 1342, Bangladesh
[4] Amer Int Univ Bangladesh, Dept Comp Sci, Dhaka 1229, Bangladesh
[5] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 9658580, Japan
关键词
attention mechanism; C-GAN; Generative adversarial networks; text-to-image synthesis; ATTENTION; MODELS;
D O I
10.1109/ACCESS.2024.3435541
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The emergence of generative adversarial networks (GANs) has ignited substantial interest in the domain of synthesizing images from textual descriptions. This approach has demonstrated remarkable versatility and user-friendliness in producing conditioned images, showcasing notable progress in areas like diversity, visual realism, and semantic alignment in recent years. Notwithstanding these developments, the discipline still faces difficulties, such as producing high-resolution pictures with several objects and developing trustworthy evaluation standards that are in line with human vision. The goal of this study is to provide a comprehensive overview of the state of stochastic text-to-image creation models as of right now. It examines how they have changed over the previous five years and suggests a classification system depending on the degree of supervision required. The paper highlights shortcomings, provides a critical evaluation of current approaches for assessing text-to-image synthesizing models, and suggests further study areas. These goals include improving the training of models and designs for architecture, developing more reliable assessment criteria, and fine-tuning datasets. This review, which focuses on text-to-image synthesizing, is a useful addition to earlier surveys on adversarial networks that are generative and offers guidance for future studies on the subject.
引用
收藏
页码:178401 / 178440
页数:40
相关论文
共 221 条
[1]   A survey and taxonomy of adversarial neural networks for text-to-image synthesis [J].
Agnese, Jorge ;
Herrera, Jonathan ;
Tao, Haicheng ;
Zhu, Xingquan .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (04)
[2]  
Ahmed S., 2023, PROC INT C ELECT COM, P1
[3]  
Amir-Ul-Haque Bhuiyan T. M., 2023, P INT C NEXT GEN COM, P1
[4]   Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space [J].
Anh Nguyen ;
Clune, Jeff ;
Bengio, Yoshua ;
Dosovitskiy, Alexey ;
Yosinski, Jason .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3510-3520
[5]  
Arefin Mahira, 2023, 2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), P357, DOI 10.1109/ICICT4SD59951.2023.10303496
[6]  
Arjovsky Martin., 2017, arXiv
[7]   Specifying Object Attributes and Relations in Interactive Scene Generation [J].
Ashual, Oron ;
Wolf, Lior .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4560-4568
[8]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[9]  
Balaji Y, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1995
[10]  
Barratt S, 2018, Arxiv, DOI arXiv:1801.01973