Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

被引：26

作者：

Tan, Hongchen ^{[1
]}

Liu, Xiuping ^{[1
]}

Yin, Baocai ^{[2
]}

Li, Xin ^{[3
,4
]}

机构：

[1] Dalian Univ Technol, Sch Math Sci, Dalian 116024, Peoples R China

[2] Dalian Univ Technol, Dept Elect Informat & Elect Engn, Dalian 116024, Peoples R China

[3] Louisiana State Univ, Sch Elect Engn & Comp Sci, Baton Rouge, LA 70803 USA

[4] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2022年 / 24卷

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

Semantics; Task analysis; Generative adversarial networks; Generators; Gallium nitride; Feature extraction; Visualization; Cross-modal semantic matching; generative adversarial network (GAN); text-to-image synthesis; text _CNNs;

D O I：

10.1109/TMM.2021.3060291

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Synthesizing photo-realistic images based on text descriptions is a challenging image generation problem. Although many recent approaches have significantly advanced the performance of text-to-image generation, to guarantee semantic matchings between the text description and synthesized image remains very challenging. In this paper, we propose a new model, Cross-modal Semantic Matching Generative Adversarial Networks (CSM-GAN), to improve the semantic consistency between text description and synthesized image for a fine-grained text-to-image generation. Two new modules are proposed in CSM-GAN: Text Encoder Module (TEM) and Textual-Visual Semantic Matching Module (TVSMM). TVSMM is aimed at making the distance of the pairs of synthesized image and its corresponding text description closer, in global semantic embedding space, than those of mismatched pairs. This improves the semantic consistency and consequently, the generalizability of CSM-GAN. In TEM, we introduce Text Convolutional Neural Networks (Text_CNNs) to capture and highlight local visual features in textual descriptions. Thorough experiments on two public benchmark datasets demonstrated the superiority of CSM-GAN over other representative state-of-the-art methods.

引用

页码：832 / 845

页数：14

共 50 条

[1] Cross-modal Feature Alignment based Hybrid Attentional Generative Adversarial Networks for text-to-image synthesis
Cheng, Qingrong
Gu, Xiaodong
DIGITAL SIGNAL PROCESSING, 2020, 107
[2] Generative Adversarial Networks with Adaptive Semantic Normalization for text-to-image synthesis
Huang, Siyue
Chen, Ying
DIGITAL SIGNAL PROCESSING, 2022, 120
[3] SF-GAN: Semantic fusion generative adversarial networks for text-to-image synthesis
Yang, Bing
Xiang, Xueqin
Kong, Wanzeng
Zhang, Jianhai
Yao, Jinliang
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262
[4] Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks
Cheng, Qingrong
Wen, Keyu
Gu, Xiaodong
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7062 - 7075
[5] A survey of generative adversarial networks and their application in text-to-image synthesis
Zeng, Wu
Zhu, Heng-liang
Lin, Chuan
Xiao, Zheng-ying
ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (12): : 7142 - 7181
[6] TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks
Ku, Hyeeun
Lee, Minhyeok
APPLIED SCIENCES-BASEL, 2023, 13 (08):
[7] A Comparative Study of Generative Adversarial Networks for Text-to-Image Synthesis
Chopra, Muskaan
Singh, Sunil K.
Sharma, Akhil
Gill, Shabeg Singh
INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2022, 14 (01):
[8] Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks
Tan, Yong Xuan
Lee, Chin Poo
Neo, Mai
Lim, Kian Ming
Lim, Jit Yan
IAENG International Journal of Computer Science, 2022, 49 (01) : 1 - 7
[9] Multi-scale dual-modal generative adversarial networks for text-to-image synthesis
Jiang, Bin
Huang, Yun
Huang, Wei
Yang, Chao
Xu, Fangqiang
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15061 - 15077
[10] Multi-scale dual-modal generative adversarial networks for text-to-image synthesis
Bin Jiang
Yun Huang
Wei Huang
Chao Yang
Fangqiang Xu
Multimedia Tools and Applications, 2023, 82 : 15061 - 15077

← 1 2 3 4 5 →