Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

被引:26
|
作者
Tan, Hongchen [1 ]
Liu, Xiuping [1 ]
Yin, Baocai [2 ]
Li, Xin [3 ,4 ]
机构
[1] Dalian Univ Technol, Sch Math Sci, Dalian 116024, Peoples R China
[2] Dalian Univ Technol, Dept Elect Informat & Elect Engn, Dalian 116024, Peoples R China
[3] Louisiana State Univ, Sch Elect Engn & Comp Sci, Baton Rouge, LA 70803 USA
[4] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Semantics; Task analysis; Generative adversarial networks; Generators; Gallium nitride; Feature extraction; Visualization; Cross-modal semantic matching; generative adversarial network (GAN); text-to-image synthesis; text _CNNs;
D O I
10.1109/TMM.2021.3060291
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Synthesizing photo-realistic images based on text descriptions is a challenging image generation problem. Although many recent approaches have significantly advanced the performance of text-to-image generation, to guarantee semantic matchings between the text description and synthesized image remains very challenging. In this paper, we propose a new model, Cross-modal Semantic Matching Generative Adversarial Networks (CSM-GAN), to improve the semantic consistency between text description and synthesized image for a fine-grained text-to-image generation. Two new modules are proposed in CSM-GAN: Text Encoder Module (TEM) and Textual-Visual Semantic Matching Module (TVSMM). TVSMM is aimed at making the distance of the pairs of synthesized image and its corresponding text description closer, in global semantic embedding space, than those of mismatched pairs. This improves the semantic consistency and consequently, the generalizability of CSM-GAN. In TEM, we introduce Text Convolutional Neural Networks (Text_CNNs) to capture and highlight local visual features in textual descriptions. Thorough experiments on two public benchmark datasets demonstrated the superiority of CSM-GAN over other representative state-of-the-art methods.
引用
收藏
页码:832 / 845
页数:14
相关论文
共 50 条
  • [1] Cross-modal Feature Alignment based Hybrid Attentional Generative Adversarial Networks for text-to-image synthesis
    Cheng, Qingrong
    Gu, Xiaodong
    DIGITAL SIGNAL PROCESSING, 2020, 107
  • [2] Generative Adversarial Networks with Adaptive Semantic Normalization for text-to-image synthesis
    Huang, Siyue
    Chen, Ying
    DIGITAL SIGNAL PROCESSING, 2022, 120
  • [3] SF-GAN: Semantic fusion generative adversarial networks for text-to-image synthesis
    Yang, Bing
    Xiang, Xueqin
    Kong, Wanzeng
    Zhang, Jianhai
    Yao, Jinliang
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262
  • [4] Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks
    Cheng, Qingrong
    Wen, Keyu
    Gu, Xiaodong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7062 - 7075
  • [5] A survey of generative adversarial networks and their application in text-to-image synthesis
    Zeng, Wu
    Zhu, Heng-liang
    Lin, Chuan
    Xiao, Zheng-ying
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (12): : 7142 - 7181
  • [6] TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks
    Ku, Hyeeun
    Lee, Minhyeok
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [7] A Comparative Study of Generative Adversarial Networks for Text-to-Image Synthesis
    Chopra, Muskaan
    Singh, Sunil K.
    Sharma, Akhil
    Gill, Shabeg Singh
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2022, 14 (01):
  • [8] Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks
    Tan, Yong Xuan
    Lee, Chin Poo
    Neo, Mai
    Lim, Kian Ming
    Lim, Jit Yan
    IAENG International Journal of Computer Science, 2022, 49 (01) : 1 - 7
  • [9] Multi-scale dual-modal generative adversarial networks for text-to-image synthesis
    Jiang, Bin
    Huang, Yun
    Huang, Wei
    Yang, Chao
    Xu, Fangqiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15061 - 15077
  • [10] Multi-scale dual-modal generative adversarial networks for text-to-image synthesis
    Bin Jiang
    Yun Huang
    Wei Huang
    Chao Yang
    Fangqiang Xu
    Multimedia Tools and Applications, 2023, 82 : 15061 - 15077