Autoencoder-Based Collaborative Attention GAN for Multi-Modal Image Synthesis

被引：9

作者：

Cao, Bing ^{[1
,2
]}

Cao, Haifang ^{[1
,3
]}

Liu, Jiaxu ^{[1
,3
]}

Zhu, Pengfei ^{[1
,3
]}

Zhang, Changqing ^{[1
,3
]}

Hu, Qinghua ^{[1
,3
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300403, Peoples R China

[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710000, Peoples R China

[3] Tianjin Univ, Haihe Lab Informat echnol Applicat Innovat, Tianjin 300403, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Image synthesis; Collaboration; Task analysis; Generative adversarial networks; Feature extraction; Data models; Image reconstruction; Multi-modal image synthesis; collaborative attention; single-modal attention; multi-modal attention; TRANSLATION; NETWORK;

D O I：

10.1109/TMM.2023.3274990

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-modal images are required in a wide range of practical scenarios, from clinical diagnosis to public security. However, certain modalities may be incomplete or unavailable because of the restricted imaging conditions, which commonly leads to decision bias in many real-world applications. Despite the significant advancement of existing image synthesis techniques, learning complementary information from multi-modal inputs remains challenging. To address this problem, we propose an autoencoder-based collaborative attention generative adversarial network (ACA-GAN) that uses available multi-modal images to generate the missing ones. The collaborative attention mechanism deploys a single-modal attention module and a multi-modal attention module to effectively extract complementary information from multiple available modalities. Considering the significant modal gap, we further developed an autoencoder network to extract the self-representation of target modality, guiding the generative model to fuse target-specific information from multiple modalities. This considerably improves cross-modal consistency with the desired modality, thereby greatly enhancing the image synthesis performance. Quantitative and qualitative comparisons for various multi-modal image synthesis tasks highlight the superiority of our approach over several prior methods by demonstrating more precise and realistic results.

引用

页码：995 / 1010

页数：16

共 76 条

[11] A Semisupervised Recurrent Convolutional Attention Model for Human Activity Recognition
Chen, Kaixuan
Yao, Lina
Zhang, Dalin
Wang, Xianzhi
Chang, Xiaojun
Nie, Feiping
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (05) : 1747 - 1756
[12] Energy Compaction-Based Image Compression Using Convolutional AutoEncoder
Cheng, Zhengxue
Sun, Heming
Takeuchi, Masaru
Katto, Jiro
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (04) : 860 - 873
[13] GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Cheng, Zhi-Qi
Dai, Qi
Li, Siyao
Mitamura, Teruko
Hauptmann, Alexander
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3272 - 3281
[14] Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks
Cho, Kyunghyun
Courville, Aaron
Bengio, Yoshua
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1875 - 1886
[15] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Choi, Yunjey
Choi, Minje
Kim, Munyoung
Ha, Jung-Woo
Kim, Sunghun
Choo, Jaegul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8789 - 8797
[16] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
Deng, Cheng
Xu, Xinxun
Wang, Hao
Yang, Muli
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
[17] Dosovitskiy A., 2021, An image is worth 16x16 words: Transformers for image recognition at scale, P1
[18] SPA-GAN: Spatial Attention GAN for Image-to-Image Translation
Emami, Hajar
Aliabadi, Majid Moradi
Dong, Ming
Chinnam, Ratna Babu
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 391 - 401
[19] Hyperspectral Images Classification Based on Dense Convolutional Networks with Spectral-Wise Attention Mechanism
Fang, Bei
Li, Ying
Zhang, Haokui
Chan, Jonathan Cheung-Wai
[J]. REMOTE SENSING, 2019, 11 (02)
[20] Attention Branch Network: Learning of Attention Mechanism for Visual Explanation
Fukui, Hiroshi
Hirakawa, Tsubasa
Yamashita, Takayoshi
Fujiyoshi, Hironobu
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10697 - 10706

← 1 2 3 4 5 6 7 8 →