LeGFusion: Locally Enhanced Global Learning for Multimodal Image Fusion

被引：1

作者：

Zhang, Jing ^{[1
]}

Liu, Aiping ^{[1
]}

Liu, Yu ^{[2
]}

Qiu, Bensheng ^{[1
]}

Xie, Qingguo ^{[1
]}

Chen, Xun ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Peoples R China

[2] Hefei Univ Technol, Dept Biomed Engn, Hefei 230009, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2024年 / 24卷 / 08期

关键词：

Image fusion; Transformers; Task analysis; Sensors; Feature extraction; Generative adversarial networks; Biomedical imaging; Locally enhanced global learning; multimodal image fusion (MMIF); transformer; NETWORK; PERFORMANCE; INFORMATION; FRAMEWORK; NEST;

D O I：

10.1109/JSEN.2024.3371056

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multimodal image fusion (MMIF) can provide more comprehensive scene characteristics by synthesizing a single image from multi-sensor images of the same scene, which works out the limitation of single-type hardware. To handle MMIF tasks, current deep learning (DL)-based methods usually use convolutional neural networks (CNNs) or combine transformer to extract local and global contextual information of source images. However, none of the existing works fully explores contextual information both across modalities and within single modalities, leading to limited fusion results. To this end, we propose a new MMIF method via locally enhanced global learning, termed as LeGFusion. Specifically, the network of LeGFusion is devised based on locally enhanced transformer block (LETB), which can capture long-range dependencies benefiting from nonoverlapping window-based self-attention while capturing useful local context with the utilization of the convolution operator into transformer. On one hand, several LETBs are deployed to extract global contexts from each modality while emphasizing its local information. On the other hand, the fusion module that also consists of LETBs is designed to integrate multimodal features by perceiving cross-modal local and global interactions. Powered by these intramodal and intermodal contextual information exploration, the proposed LeGFusion enjoys a high capability in capturing significant complementary information for image fusion. Extensive experiments are conducted on two types of MMIF tasks, including infrared-visible image fusion (IVF) and medical image fusion. The qualitative and quantitative evaluation results demonstrate the superiority of our LeGFusion over state-of-the-art methods. Furthermore, we validate the generalization ability of LeGFusion without fine-tuning and achieve fantastic results.

引用

页码：12806 / 12818

页数：13

共 65 条

[21] Microsoft COCO: Common Objects in Context [J].

Lin, Tsung-Yi ;

Maire, Michael ;

Belongie, Serge ;

Hays, James ;

Perona, Pietro ;

Ramanan, Deva ;

Dollar, Piotr ;

Zitnick, C. Lawrence .

COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755

[22] Multi-scale saliency measure and orthogonal space for visible and infrared image fusion [J].

Liu, Yaochen ;

Dong, Lili ;

Ren, Wei ;

Xu, Wenhai .

INFRARED PHYSICS & TECHNOLOGY, 2021, 118

[23] Glioma Segmentation-Oriented Multi-Modal MR Image Fusion With Adversarial Learning [J].

Liu, Yu ;

Shi, Yu ;

Mu, Fuhao ;

Cheng, Juan ;

Chen, Xun .

IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (08) :1528-1531

[24] Multimodal MRI Volumetric Data Fusion With Convolutional Neural Networks [J].

Liu, Yu ;

Shi, Yu ;

Mu, Fuhao ;

Cheng, Juan ;

Li, Chang ;

Chen, Xun .

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71

[25] Medical Image Fusion via Convolutional Sparsity Based Morphological Component Analysis [J].

Liu, Yu ;

Chen, Xun ;

Ward, Rabab K. ;

Wang, Z. Jane .

IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (03) :485-489

[26]

Liu Y, 2017, 2017 20TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), P1070

[27] A general framework for image fusion based on multi-scale transform and sparse representation [J].

Liu, Yu ;

Liu, Shuping ;

Wang, Zengfu .

INFORMATION FUSION, 2015, 24 :147-164

[28] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].

Liu, Ze ;

Lin, Yutong ;

Cao, Yue ;

Hu, Han ;

Wei, Yixuan ;

Zhang, Zheng ;

Lin, Stephen ;

Guo, Baining .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002

[29] Objective Assessment of Multiresolution Image Fusion Algorithms for Context Enhancement in Night Vision: A Comparative Study [J].

Liu, Zheng ;

Blasch, Erik ;

Xue, Zhiyun ;

Zhao, Jiying ;

Laganiere, Robert ;

Wu, Wei .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) :94-109

[30]

Loshchilov I., 2018, INT C LEARN REPR, P1

← 1 2 3 4 5 6 7 →