Transformer Based Conditional GAN for Multimodal Image Fusion

被引:32
作者
Zhang, Jun [1 ]
Jiao, Licheng [1 ]
Ma, Wenping [1 ]
Liu, Fang [1 ]
Liu, Xu [1 ]
Li, Lingling [1 ]
Chen, Puhua [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Joint Int Res Lab Intelligent Percept & Computat, Int Res Ctr Intelligent Percept & Computat,Minist, Sch Artificial Intelligence,Key Lab Intelligent Pe, Xian 710071, Peoples R China
关键词
Image fusion; Generators; Training; Feature extraction; Transformers; Generative adversarial networks; Thermal sensors; Generative adversarial network; multimodal image fusion; transformer; GENERATIVE ADVERSARIAL NETWORK; MULTISCALE; FRAMEWORK; DEEP;
D O I
10.1109/TMM.2023.3243659
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal Image fusion is becoming urgent in multi-sensor information utilization. However, existing end-to-end image fusion frameworks ignore a priori knowledge integration and long-distance dependencies across domains, which brings challenges to the network convergence and global image perception in complex scenes. In this article, a conditional generative adversarial network with transformer (TCGAN) is proposed for multimodal image fusion. The generator is to generate a fused image with the source images content. The discriminators are adopted to distinguish the differences between the fused image and the source images. Adversarial training makes the final fused image to maintain the structural and textural details in the cross-modal images simultaneously. In particular, a wavelet fusion module makes the inputs contain image content from different domains as much as possible. The extracted convolutional features interact in the multiscale cross-modal transformer fusion module to fully complement the associated information. It makes the generator to focus on both local and global context. TCGAN fully considers the training efficiency of the adversarial process and the integrated retention of redundant information. Various experimental results of TCGAN have highlighted targets, rich details, and fast convergence properties on public datasets.
引用
收藏
页码:8988 / 9001
页数:14
相关论文
共 54 条
  • [1] Wavelet based image fusion techniques - An introduction, review and comparison
    Amolins, Krista
    Zhang, Yun
    Dare, Peter
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2007, 62 (04) : 249 - 263
  • [2] A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics
    Azam, Muhammad Adeel
    Khan, Khan Bahadar
    Salahuddin, Sana
    Rehman, Eid
    Khan, Sajid Ali
    Khan, Muhammad Attique
    Kadry, Seifedine
    Gandomi, Amir H.
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 144
  • [3] Multi-focus Image Fusion using Neutrosophic based Wavelet Transform
    Bhat, Shiveta
    Koundal, Deepika
    [J]. APPLIED SOFT COMPUTING, 2021, 106
  • [4] A saliency-based multiscale approach for infrared and visible image fusion
    Chen, Jun
    Wu, Kangle
    Cheng, Zhuo
    Luo, Linbo
    [J]. SIGNAL PROCESSING, 2021, 182
  • [5] ZeRGAN: Zero-Reference GAN for Fusion of Multispectral and Panchromatic Images
    Diao, Wenxiu
    Zhang, Feng
    Sun, Jiande
    Xing, Yinghui
    Zhang, Kai
    Bruzzone, Lorenzo
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 8195 - 8209
  • [6] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [7] A Survey on Vision Transformer
    Han, Kai
    Wang, Yunhe
    Chen, Hanting
    Chen, Xinghao
    Guo, Jianyuan
    Liu, Zhenhua
    Tang, Yehui
    Xiao, An
    Xu, Chunjing
    Xu, Yixing
    Yang, Zhaohui
    Zhang, Yiman
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
  • [8] An Adaptive Fusion Algorithm for Visible and Infrared Videos Based on Entropy and the Cumulative Distribution of Gray Levels
    Hu, Hai-Miao
    Wu, Jiawei
    Li, Bo
    Guo, Qiang
    Zheng, Jin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (12) : 2706 - 2719
  • [9] An infrared and visible image fusion method based on multi-scale transformation and norm optimization
    Li, Guofa
    Lin, Yongjie
    Qu, Xingda
    [J]. INFORMATION FUSION, 2021, 71 : 109 - 129
  • [10] RFN-Nest: An end-to-end residual fusion network for infrared and visible images
    Li, Hui
    Wu, Xiao-Jun
    Kittler, Josef
    [J]. INFORMATION FUSION, 2021, 73 : 72 - 86