Trans2Fuse: Empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks

被引:10
|
作者
Qu, Linhao
Liu, Shaolei
Wang, Manning
Li, Shiman
Yin, Siqi
Song, Zhijian [1 ]
机构
[1] Fudan Univ, Digital Med Res Ctr, Sch Basic Med Sci, Shanghai 200032, Peoples R China
关键词
Image fusion; Transformer; Self-supervised learning; Deep learning; EXTRACTION;
D O I
10.1016/j.eswa.2023.121363
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image fusion enhances a single image by integrating information from multiple sources with complementary data. Present end-to-end fusion methods often face overfitting or intricate parameter tuning due to inadequate task-specific training data. To address this, two-stage approaches utilize encoder-decoder networks trained on extensive natural image datasets, yet suffer from limited performance due to domain disparities. In this work, we devise a novel encoder-decoder fusion framework and introduce a self-supervised scheme based on destruction-reconstruction. This approach facilitates task-specific feature learning by proposing three auxiliary tasks: pixel intensity non-linear transformation for multi-modal fusion, brightness transformation for multi-exposure fusion, and noise transformation for multi-focus fusion. By randomly selecting one task during model training, we mutually reinforce different fusion tasks, enhancing network generalizability. We innovate an encoder combining Convolutional Neural Network (CNN) and Transformer to extract both local and global features. Rigorous evaluations against 11 traditional and deep learning-based methods span four benchmark datasets: infrared-visible fusion, medical fusion, multi-exposure fusion, and multi-focus fusion. Comprehensive assessments, encompassing nine metrics from diverse viewpoints, consistently demonstrate the superior performance of our approach in all scenarios. We will make our code, datasets, and fused images publicly available.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Self-supervised multi-modal fusion network for multi-modal thyroid ultrasound image diagnosis
    Xiang, Zhuo
    Zhuo, Qiuluan
    Zhao, Cheng
    Deng, Xiaofei
    Zhu, Ting
    Wang, Tianfu
    Jiang, Wei
    Lei, Baiying
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 150
  • [2] Multimodal Image Fusion via Self-Supervised Transformer
    Zhang, Jing
    Liu, Yu
    Liu, Aiping
    Xie, Qingguo
    Ward, Rabab
    Wang, Z. Jane
    Chen, Xun
    IEEE SENSORS JOURNAL, 2023, 23 (09) : 9796 - 9807
  • [3] Self-Supervised Distilled Learning for Multi-modal Misinformation Identification
    Mu, Michael
    Das Bhattacharjee, Sreyasee
    Yuan, Junsong
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2818 - 2827
  • [4] SELF-SUPERVISED LEARNING OF MULTI-MODAL COOPERATION FOR SAR DESPECKLING
    Gaya, Victor
    Dalsasso, Emanuele
    Denis, Loic
    Tupin, Florence
    Pinel-Puyssegur, Beatrice
    Guerin, Cyrielle
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 2180 - 2183
  • [5] Exploring Self-Supervised Learning for Multi-Modal Remote Sensing Pre-Training via Asymmetric Attention Fusion
    Xu, Guozheng
    Jiang, Xue
    Li, Xiangtai
    Zhang, Ze
    Liu, Xingzhao
    REMOTE SENSING, 2023, 15 (24)
  • [6] TS-DENet: a transferable self-supervised learning method for multi-modal fluorescence image
    Huang, Liangliang
    Wen, Zhong
    Wang, Zhaokai
    Li, Quanzhi
    Deng, Qilin
    Liu, Xu
    Yang, Qing
    APPLIED OPTICS, 2025, 64 (10) : 2534 - 2544
  • [7] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
    Bo Liu
    Ruoyi Song
    Yuejia Xiang
    Junbo Du
    Weijian Ruan
    Jinhui Hu
    IEEE/CAAJournalofAutomaticaSinica, 2022, 9 (11) : 2031 - 2033
  • [8] Multi-modal Food Recommendation Using Clustering and Self-supervised Learning
    Zhang, Yixin
    Zhou, Xin
    Meng, Qianwen
    Zhu, Fanglin
    Xu, Yonghui
    Shen, Zhiqi
    Cui, Lizhen
    PRICAI 2024: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2025, 15281 : 269 - 281
  • [9] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
    Liu, Bo
    Song, Ruoyi
    Xiang, Yuejia
    Du, Junbo
    Ruan, Weijian
    Hu, Jinhui
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (11) : 2031 - 2033
  • [10] Highly Interactive Self-Supervised Learning for Multi-Modal Trajectory Prediction
    Xie, Wenda
    Liu, Yahui
    Zhao, Hongxia
    Guo, Chao
    Dai, Xingyuan
    Lv, Yisheng
    IFAC PAPERSONLINE, 2024, 58 (10): : 231 - 236