MGF-GAN: Multi Granularity Text Feature Fusion for Text-guided-Image Synthesis

被引：1

作者：

Wang, Xingfu ^{[1
]}

Li, Xiangyu ^{[1
]}

Hawbani, Ammar ^{[1
]}

Zhao, Liang ^{[2
]}

Alsamhi, Saeed Hamood ^{[3
,4
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China

[2] Shenyang Aerosp Univ, Sch Comp Sci, Shenyang, Peoples R China

[3] Natl Univ Ireland, Insight Ctr Data Analyt, Galway, Ireland

[4] IBB Univ, Ibb, Yemen

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM | 2022年

关键词：

Text-guided-Image; GAN; Aspect-level; Semantic consistency;

D O I：

10.1109/TrustCom56396.2022.00197

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We have made research achievements worth sharing on the complicated topic of text-to-image synthesis. Our analysis of popular articles shows that they often use stacked structures to construct and generate confrontation network models and usually introduce multiple sets of generators and discriminator pairs. The entanglement between different generators affects the quality of the final synthesized image. Some researchers have proposed a single-stage network model to avoid traps between multiple generators, But it lacks the use of unstructured natural language information with different granularity. To correct this serious defect, we propose a multi-granularity feature network MGFGAN, which plays the role of text information with different granularity based on the advantages of the single-stage network. Specifically, we input the three granularity features of the text, including sentences, aspect words, and single words of text, into different stages of the model through spatial attention and channel attention mechanisms to gradually refine the synthetic image from global and local perspectives. In addition, we reconstruct the loss function based on the contrast concept to stabilize the training and ensure that the visual meaning between the synthesized image and the natural language is consistent. We conducted validity experiments on CUB bird and COCO. The significant effect is sufficient to prove the effectiveness and advancement of our MGF-GAN.

引用

页码：1398 / 1403

页数：6

共 50 条

[31] MISL: Multi-grained image-text semantic learning for text-guided image inpainting
Wu, Xingcai
Zhao, Kejun
Huang, Qianding
Wang, Qi
Yang, Zhenguo
Hao, Gefei
PATTERN RECOGNITION, 2024, 145
[32] MIGT: Multi-modal image inpainting guided with text
Li, Ailin
Zhao, Lei
Zuo, Zhiwen
Wang, Zhizhong
Xing, Wei
Lu, Dongming
NEUROCOMPUTING, 2023, 520 : 376 - 385
[33] A Comparison between AttnGAN and DF GAN: Text to Image Synthesis
Sumi, Philo
Sindhuja, S.
Sureshkumar, S.
ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 615 - 619
[34] Study on Feature Layer fusion Classification Model on Text/Image Information
Zhang, Xiao-Dan
2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 1050 - 1053
[35] Study on Feature Layer fusion Classification Model on Text/Image Information
Zhang, Xiao-Dan
2010 INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT (CCCM2010), VOL IV, 2010, : 196 - 198
[36] MMFL: Multimodal Fusion Learning for Text-Guided Image Inpainting
Lin, Qing
Yan, Bo
Li, Jichun
Tan, Weimin
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1094 - 1102
[37] DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis
Yang, Bing
Xiang, Xueqin
Kong, Wangzeng
Zhang, Jianhai
Peng, Yong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6956 - 6967
[38] Image and Encoded Text Fusion for Multi-Modal Classification
Gallo, I.
Calefati, A.
Nawaz, S.
Janjua, M. K.
2018 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2018, : 203 - 209
[39] Edge consistent image completion based on multi-granularity feature fusion
Zhang S.-Y.
Wang G.-Y.
Liu Q.
Wang R.-Q.
Kongzhi yu Juece/Control and Decision, 2022, 37 (12): : 3240 - 3250
[40] EMF-Net: An edge-guided multi-feature fusion network for text manipulation detection
Ren, Ruyong
Hao, Qixian
Gu, Feng
Niu, Shaozhang
Zhang, Jiwei
Wang, Maosen
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249

← 1 2 3 4 5 →