Autoencoder-Based Collaborative Attention GAN for Multi-Modal Image Synthesis

被引:9
作者
Cao, Bing [1 ,2 ]
Cao, Haifang [1 ,3 ]
Liu, Jiaxu [1 ,3 ]
Zhu, Pengfei [1 ,3 ]
Zhang, Changqing [1 ,3 ]
Hu, Qinghua [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300403, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710000, Peoples R China
[3] Tianjin Univ, Haihe Lab Informat echnol Applicat Innovat, Tianjin 300403, Peoples R China
关键词
Image synthesis; Collaboration; Task analysis; Generative adversarial networks; Feature extraction; Data models; Image reconstruction; Multi-modal image synthesis; collaborative attention; single-modal attention; multi-modal attention; TRANSLATION; NETWORK;
D O I
10.1109/TMM.2023.3274990
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal images are required in a wide range of practical scenarios, from clinical diagnosis to public security. However, certain modalities may be incomplete or unavailable because of the restricted imaging conditions, which commonly leads to decision bias in many real-world applications. Despite the significant advancement of existing image synthesis techniques, learning complementary information from multi-modal inputs remains challenging. To address this problem, we propose an autoencoder-based collaborative attention generative adversarial network (ACA-GAN) that uses available multi-modal images to generate the missing ones. The collaborative attention mechanism deploys a single-modal attention module and a multi-modal attention module to effectively extract complementary information from multiple available modalities. Considering the significant modal gap, we further developed an autoencoder network to extract the self-representation of target modality, guiding the generative model to fuse target-specific information from multiple modalities. This considerably improves cross-modal consistency with the desired modality, thereby greatly enhancing the image synthesis performance. Quantitative and qualitative comparisons for various multi-modal image synthesis tasks highlight the superiority of our approach over several prior methods by demonstrating more precise and realistic results.
引用
收藏
页码:995 / 1010
页数:16
相关论文
共 76 条
  • [11] A Semisupervised Recurrent Convolutional Attention Model for Human Activity Recognition
    Chen, Kaixuan
    Yao, Lina
    Zhang, Dalin
    Wang, Xianzhi
    Chang, Xiaojun
    Nie, Feiping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (05) : 1747 - 1756
  • [12] Energy Compaction-Based Image Compression Using Convolutional AutoEncoder
    Cheng, Zhengxue
    Sun, Heming
    Takeuchi, Masaru
    Katto, Jiro
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (04) : 860 - 873
  • [13] GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
    Cheng, Zhi-Qi
    Dai, Qi
    Li, Siyao
    Mitamura, Teruko
    Hauptmann, Alexander
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3272 - 3281
  • [14] Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks
    Cho, Kyunghyun
    Courville, Aaron
    Bengio, Yoshua
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1875 - 1886
  • [15] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
    Choi, Yunjey
    Choi, Minje
    Kim, Munyoung
    Ha, Jung-Woo
    Kim, Sunghun
    Choo, Jaegul
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8789 - 8797
  • [16] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
    Deng, Cheng
    Xu, Xinxun
    Wang, Hao
    Yang, Muli
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
  • [17] Dosovitskiy A., 2021, An image is worth 16x16 words: Transformers for image recognition at scale, P1
  • [18] SPA-GAN: Spatial Attention GAN for Image-to-Image Translation
    Emami, Hajar
    Aliabadi, Majid Moradi
    Dong, Ming
    Chinnam, Ratna Babu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 391 - 401
  • [19] Hyperspectral Images Classification Based on Dense Convolutional Networks with Spectral-Wise Attention Mechanism
    Fang, Bei
    Li, Ying
    Zhang, Haokui
    Chan, Jonathan Cheung-Wai
    [J]. REMOTE SENSING, 2019, 11 (02)
  • [20] Attention Branch Network: Learning of Attention Mechanism for Visual Explanation
    Fukui, Hiroshi
    Hirakawa, Tsubasa
    Yamashita, Takayoshi
    Fujiyoshi, Hironobu
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10697 - 10706