Multimodal Fusion Generative Adversarial Network for Image Synthesis

被引：2

作者：

Zhao, Liang ^{[1
]}

Hu, Qinghao ^{[1
]}

Li, Xiaoyuan ^{[1
]}

Zhao, Jingyuan ^{[2
]}

机构：

[1] Dalian Univ Technol, Sch Software Technol, Dalian 116024, Peoples R China

[2] Dalian Univ Technol, Cent Hosp, Dalian 116024, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Image synthesis; Semantics; Image quality; Generative adversarial networks; Attention mechanisms; Mathematical models; Birds; Feature fusion; generative adversarial network; text-to-image synthesis;

D O I：

10.1109/LSP.2024.3404855

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Text-to-image synthesis has advanced significantly; however, a crucial limitation persists: textual descriptions often neglect essential background details, leading to blurred backgrounds and diminished image quality. To address this, we propose a multimodal fusion framework that integrates information from both text and image modalities. Our approach introduces a background mask to compensate for missing textual descriptions of background elements. Additionally, we employ an adaptive channel attention mechanism to effectively exploit fused features, dynamically accentuating informative feature maps. Furthermore, we introduce a novel fusion conditional loss, ensuring that generated images not only align with textual descriptions but also exhibit realistic backgrounds. Experimental evaluations on the Caltech-UCSD Birds 200 and COCO datasets demonstrate the efficacy of our approach, with our Frechet Inception Distance (FID) achieving a commendable score of 15.38 on the CUB dataset, surpassing several state-of-the-art approaches.

引用

页码：1865 / 1869

页数：5

共 21 条

[1] Vector Quantized Diffusion Model for Text-to-Image Synthesis [J].

Gu, Shuyang ;

Chen, Dong ;

Bao, Jianmin ;

Wen, Fang ;

Zhang, Bo ;

Chen, Dongdong ;

Yuan, Lu ;

Guo, Baining .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :10686-10696

[2]

Heusel M, 2017, ADV NEUR IN, V30

[3]

Hu M., 2023, P 11 INT C LEARN REP, P24

[4] Edge-Enhanced GAN for Remote Sensing Image Superresolution [J].

Jiang, Kui ;

Wang, Zhongyuan ;

Yi, Peng ;

Wang, Guangcheng ;

Lu, Tao ;

Jiang, Junjun .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (08) :5799-5812

[5] Text to Image Generation with Semantic-Spatial Aware GAN [J].

Liao, Wentong ;

Hu, Kai ;

Yang, Michael Ying ;

Rosenhahn, Bodo .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :18166-18175

[6] Microsoft COCO: Common Objects in Context [J].

Lin, Tsung-Yi ;

Maire, Michael ;

Belongie, Serge ;

Hays, James ;

Perona, Pietro ;

Ramanan, Deva ;

Dollar, Piotr ;

Zitnick, C. Lawrence .

COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755

[7] Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation [J].

Liu, Deyin ;

Wu, Lin ;

Zheng, Feng ;

Liu, Lingqiao ;

Wang, Meng .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) :8589-8601

[8] SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis [J].

Peng, Dunlu ;

Yang, Wuchen ;

Liu, Cong ;

Lu, Shuairui .

NEURAL NETWORKS, 2021, 138 :57-67

[9] MirrorGAN: Learning Text-to-image Generation by Redescription [J].

Qiao, Tingting ;

Zhang, Jing ;

Xu, Duanqing ;

Tao, Dacheng .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1505-1514

[10]

Radford A, 2021, PR MACH LEARN RES, V139

← 1 2 3 →