Text to image synthesis with multi-granularity feature aware enhancement Generative Adversarial Networks

被引:0
|
作者
Dong, Pei [1 ]
Wu, Lei [1 ]
Li, Ruichen [1 ]
Meng, Xiangxu [1 ]
Meng, Lei [1 ]
机构
[1] Shandong Univ, Sch Software, 1500 ShunHua Rd High Tech Ind Dev Zone, Jinan 250101, Peoples R China
关键词
Generative adversarial network; Multi-granularity feature aware enhancement; Text-to-image; Autoregressive; Diffusion;
D O I
10.1016/j.cviu.2024.104042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthesizing complex images from text presents challenging. Compared to autoregressive and diffusion modelbased methods, Generative Adversarial Network -based methods have significant advantages in terms of computational cost and generation efficiency yet remain two limitations: first, these methods often refine all features output from the previous stage indiscriminately, without considering these features are initialized gradually during the generation process; second, the sparse semantic constraints provided by the text description are typically ineffective for refining fine-grained features. These issues complicate the balance between generation quality, computational cost and inference speed. To address these issues, we propose a Multi -granularity Feature Aware Enhancement GAN (MFAE-GAN), which allows the refinement process to match the order of different granularity features being initialized. Specifically, MFAE-GAN (1) samples category -related coarse -grained features and instance -level detail -related fine-grained features at different generation stages based on different attention mechanisms in Coarse -grained Feature Enhancement (CFE) and Fine-grained Feature Enhancement (FFE) to guide the generation process spatially, (2) provides denser semantic constraints than textual semantic information through Multi -granularity Features Adaptive Batch Normalization (MFA-BN) in the process of refining fine-grained features, and (3) adopts a Global Semantics Preservation (GSP) to avoid the loss of global semantics when sampling features continuously. Extensive experimental results demonstrate that our MFAE-GAN is competitive in terms of both image generation quality and efficiency.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] SAGAN: Deep semantic-aware generative adversarial network for unsupervised image enhancement
    She, Chunyan
    Chen, Tao
    Duan, Shukai
    Wang, Lidan
    KNOWLEDGE-BASED SYSTEMS, 2023, 281
  • [22] Generating Long and Coherent Text with Multi-Level Generative Adversarial Networks
    Tang, Tianyi
    Li, Junyi
    Zhao, Wayne Xin
    Wen, Ji-Rong
    WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 : 49 - 63
  • [23] CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis
    Zhang, Yubo
    Han, Shuang
    Zhang, Zhongxin
    Wang, Jianyang
    Bi, Hongbo
    VISUAL COMPUTER, 2023, 39 (04): : 1283 - 1293
  • [24] CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis
    Yubo Zhang
    Shuang Han
    Zhongxin Zhang
    Jianyang Wang
    Hongbo Bi
    The Visual Computer, 2023, 39 : 1283 - 1293
  • [25] Class-Balanced Text to Image Synthesis With Attentive Generative Adversarial Network
    Wang, Min
    Lang, Congyan
    Liang, Liqian
    Lyu, Gengyu
    Feng, Songhe
    Wang, Tao
    IEEE MULTIMEDIA, 2021, 28 (03) : 21 - 31
  • [26] Core-attributes enhanced generative adversarial networks for robust image enhancement
    Liu, Shan
    Xiao, Guoqiang
    Lew, Michael S.
    Gao, Xinbo
    Wu, Song
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [27] Super-resolution Thermal Generative Adversarial Networks for Infrared Image Enhancement
    Lee I.H.
    Chung W.Y.
    Park C.G.
    Journal of Institute of Control, Robotics and Systems, 2022, 28 (02) : 153 - 160
  • [28] MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis
    Yang, Yuyan
    Ni, Xin
    Hao, Yanbin
    Liu, Chenyu
    Wang, Wenshan
    Liu, Yifeng
    Xie, Haiyong
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 41 - 53
  • [29] A multi-granularity knowledge association model of geological text based on hypernetwork
    Zhuang, Can
    Li, Wenjia
    Xie, Zhong
    Wu, Liang
    EARTH SCIENCE INFORMATICS, 2021, 14 (01) : 227 - 246
  • [30] A Domain Gap Aware Generative Adversarial Network for Multi-Domain Image Translation
    Xu, Wenju
    Wang, Guanghui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 72 - 84