Trans2Fuse: Empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks

被引：10

作者：

Qu, Linhao

Liu, Shaolei

Wang, Manning

Li, Shiman

Yin, Siqi

Song, Zhijian ^{[1
]}

机构：

[1] Fudan Univ, Digital Med Res Ctr, Sch Basic Med Sci, Shanghai 200032, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 236卷

关键词：

Image fusion; Transformer; Self-supervised learning; Deep learning; EXTRACTION;

D O I：

10.1016/j.eswa.2023.121363

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image fusion enhances a single image by integrating information from multiple sources with complementary data. Present end-to-end fusion methods often face overfitting or intricate parameter tuning due to inadequate task-specific training data. To address this, two-stage approaches utilize encoder-decoder networks trained on extensive natural image datasets, yet suffer from limited performance due to domain disparities. In this work, we devise a novel encoder-decoder fusion framework and introduce a self-supervised scheme based on destruction-reconstruction. This approach facilitates task-specific feature learning by proposing three auxiliary tasks: pixel intensity non-linear transformation for multi-modal fusion, brightness transformation for multi-exposure fusion, and noise transformation for multi-focus fusion. By randomly selecting one task during model training, we mutually reinforce different fusion tasks, enhancing network generalizability. We innovate an encoder combining Convolutional Neural Network (CNN) and Transformer to extract both local and global features. Rigorous evaluations against 11 traditional and deep learning-based methods span four benchmark datasets: infrared-visible fusion, medical fusion, multi-exposure fusion, and multi-focus fusion. Comprehensive assessments, encompassing nine metrics from diverse viewpoints, consistently demonstrate the superior performance of our approach in all scenarios. We will make our code, datasets, and fused images publicly available.

引用

页数：16

共 50 条

[1] Self-supervised multi-modal fusion network for multi-modal thyroid ultrasound image diagnosis
Xiang, Zhuo
Zhuo, Qiuluan
Zhao, Cheng
Deng, Xiaofei
Zhu, Ting
Wang, Tianfu
Jiang, Wei
Lei, Baiying
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 150
[2] Multimodal Image Fusion via Self-Supervised Transformer
Zhang, Jing
Liu, Yu
Liu, Aiping
Xie, Qingguo
Ward, Rabab
Wang, Z. Jane
Chen, Xun
IEEE SENSORS JOURNAL, 2023, 23 (09) : 9796 - 9807
[3] Self-Supervised Distilled Learning for Multi-modal Misinformation Identification
Mu, Michael
Das Bhattacharjee, Sreyasee
Yuan, Junsong
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2818 - 2827
[4] SELF-SUPERVISED LEARNING OF MULTI-MODAL COOPERATION FOR SAR DESPECKLING
Gaya, Victor
Dalsasso, Emanuele
Denis, Loic
Tupin, Florence
Pinel-Puyssegur, Beatrice
Guerin, Cyrielle
IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 2180 - 2183
[5] Exploring Self-Supervised Learning for Multi-Modal Remote Sensing Pre-Training via Asymmetric Attention Fusion
Xu, Guozheng
Jiang, Xue
Li, Xiangtai
Zhang, Ze
Liu, Xingzhao
REMOTE SENSING, 2023, 15 (24)
[6] TS-DENet: a transferable self-supervised learning method for multi-modal fluorescence image
Huang, Liangliang
Wen, Zhong
Wang, Zhaokai
Li, Quanzhi
Deng, Qilin
Liu, Xu
Yang, Qing
APPLIED OPTICS, 2025, 64 (10) : 2534 - 2544
[7] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
Bo Liu
Ruoyi Song
Yuejia Xiang
Junbo Du
Weijian Ruan
Jinhui Hu
IEEE/CAAJournalofAutomaticaSinica, 2022, 9 (11) : 2031 - 2033
[8] Multi-modal Food Recommendation Using Clustering and Self-supervised Learning
Zhang, Yixin
Zhou, Xin
Meng, Qianwen
Zhu, Fanglin
Xu, Yonghui
Shen, Zhiqi
Cui, Lizhen
PRICAI 2024: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2025, 15281 : 269 - 281
[9] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
Liu, Bo
Song, Ruoyi
Xiang, Yuejia
Du, Junbo
Ruan, Weijian
Hu, Jinhui
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (11) : 2031 - 2033
[10] Highly Interactive Self-Supervised Learning for Multi-Modal Trajectory Prediction
Xie, Wenda
Liu, Yahui
Zhao, Hongxia
Guo, Chao
Dai, Xingyuan
Lv, Yisheng
IFAC PAPERSONLINE, 2024, 58 (10): : 231 - 236

← 1 2 3 4 5 →