MPCFusion: Multi-scale parallel cross fusion for infrared and visible images via convolution and vision Transformer

被引:12
作者
Tang, Haojie [1 ]
Qian, Yao [1 ]
Xing, Mengliang [1 ]
Cao, Yisheng [1 ]
Liu, Gang [1 ]
机构
[1] Shanghai Univ Elect Power, Sch Automat Engn, Shanghai 200090, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; Vision Transformer; Convolution; Multi-scale feature; Infrared; NETWORK;
D O I
10.1016/j.optlaseng.2024.108094
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
The image fusion community is thriving with the wave of deep learning, and the most popular fusion methods are usually built upon well -designed network structures. However, most of the current methods do not fully exploit deeper features while ignore the importance of long-range dependencies. In this paper, a convolution and vision Transformer -based multi -scale parallel cross fusion network for infrared and visible images is proposed (MPCFusion). To exploit deeper texture details, a feature extraction module based on convolution and vision Transformer is designed. With a view to correlating the shallow features between different modalities, a parallel cross -attention module is proposed, in which a parallel -channel model efficiently preserves the proprietary modal features, followed by a cross -spatial model that ensures the information interactions between the different modalities. Moreover, a cross -domain attention module based on convolution and vision Transformer is proposed to capturing long-range dependencies between in-depth features and effectively solves the problem of global context loss. Finally, a nest -connection based decoder is used for implementing feature reconstruction. In particular, we design a new texture -guided structural similarity loss function to drive the network to preserve more complete texture details. Extensive experimental results illustrate that MPCFusion shows excellent fusion performance and generalization capabilities. The source code will be released at https:// github .com /YQ -097 /MPCFusion.
引用
收藏
页数:13
相关论文
共 60 条
[1]   Adaptive Near-Infrared and Visible Fusion for Fast Image Enhancement [J].
Awad, Mohamed ;
Elliethy, Ahmed ;
Aly, Hussein A. .
IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 :408-418
[2]  
[白玉 Bai Yu], 2020, [空军工程大学学报. 自然科学版, Journal of Air Force Engineering University. Natural Science Edition], V21, P53
[3]   Two-scale image fusion of visible and infrared images using saliency detection [J].
Bavirisetti, Durga Prasad ;
Dhuli, Ravindra .
INFRARED PHYSICS & TECHNOLOGY, 2016, 76 :52-64
[4]   Pedestrian detection with unsupervised multispectral feature learning using deep neural networks [J].
Cao, Yanpeng ;
Guan, Dayan ;
Huang, Weilin ;
Yang, Jiangxin ;
Cao, Yanlong ;
Qiao, Yu .
INFORMATION FUSION, 2019, 46 :206-217
[5]   Multi-Focus Image Fusion Based on Multi-Scale Gradients and Image Matting [J].
Chen, Jun ;
Li, Xuejiao ;
Luo, Linbo ;
Ma, Jiayi .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :655-667
[6]   Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition [J].
Cui, Guangmang ;
Feng, Huajun ;
Xu, Zhihai ;
Li, Qi ;
Chen, Yueting .
OPTICS COMMUNICATIONS, 2015, 341 :199-209
[7]  
Deshmukh M, 2010, Int J Image Process, V4
[8]   FusionCPP: Cooperative fusion of infrared and visible light images based on PCNN and PID control systems [J].
Dong, Linlu ;
Wang, Jun .
OPTICS AND LASERS IN ENGINEERING, 2024, 172
[9]  
Dosovitskiy A, 2021, INT C LEARN REPR ICL
[10]   Three-layer medical image fusion with tensor-based features [J].
Du, Jiao ;
Li, Weisheng ;
Tan, Hengliang .
INFORMATION SCIENCES, 2020, 525 :93-108