CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach

被引:122
作者
Li, Hui [1 ]
Wu, Xiao-Jun [1 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Int Joint Lab Artificial Intelligence Jiangsu Prov, Wuxi 214122, Peoples R China
关键词
Image fusion; Transformer; Cross attention; Infrared image; Visible image; NETWORK; TRANSFORMER; EFFICIENT;
D O I
10.1016/j.inffus.2023.102147
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal visual information fusion aims to integrate the multi-sensor data into a single image which contains more complementary information and less redundant features. However the complementary information is hard to extract, especially for infrared and visible images which contain big similarity gap between these two modalities. The common cross attention modules only consider the correlation, on the contrary, image fusion tasks need focus on complementarity (uncorrelation). Hence, in this paper, a novel cross attention mechanism (CAM) is proposed to enhance the complementary information. Furthermore, a two-stage training strategy based fusion scheme is presented to generate the fused images. For the first stage, two auto-encoder networks with same architecture are trained for each modality. Then, with the fixed encoders, the CAM and a decoder are trained in the second stage. With the trained CAM, features extracted from two modalities are integrated into one fused feature in which the complementary information is enhanced and the redundant features are reduced. Finally, the fused image can be generated by the trained decoder. The experimental results illustrate that our proposed fusion method obtains the SOTA fusion performance compared with the existing fusion networks. The codes of our fusion method will be available soon.
引用
收藏
页数:11
相关论文
共 69 条
[1]   Multi-feature, multi-modal, and multi-source social event detection: A comprehensive survey [J].
Afyouni, Imad ;
Al Aghbari, Zaher ;
Razack, Reshma Abdul .
INFORMATION FUSION, 2022, 79 :279-308
[2]   A new image quality metric for image fusion: The sum of the correlations of differences [J].
Aslantas, V. ;
Bendes, E. .
AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2015, 69 (12) :160-166
[3]   length Shape-Former: Bridging CNN and Transformer via ShapeConv for multimodal image matching [J].
Chen, Jiaxuan ;
Chen, Xiaoxian ;
Chen, Shuang ;
Liu, Yuyan ;
Rao, Yujing ;
Yang, Yang ;
Wang, Haifeng ;
Wu, Dan .
INFORMATION FUSION, 2023, 91 :445-457
[4]  
Dosovitskiy A., 2021, 9 INT C LEARN REPR I
[5]   GPT-3: Its Nature, Scope, Limits, and Consequences [J].
Floridi, Luciano ;
Chiriatti, Massimo .
MINDS AND MACHINES, 2020, 30 (04) :681-694
[6]   A non-reference image fusion metric based on mutual information of image features [J].
Haghighat, Mohammad Bagher Akbari ;
Aghagolzadeh, Ali ;
Seyedarabi, Hadi .
COMPUTERS & ELECTRICAL ENGINEERING, 2011, 37 (05) :744-756
[7]   CCNet: Criss-Cross Attention for Semantic Segmentation [J].
Huang, Zilong ;
Wang, Xinggang ;
Huang, Lichao ;
Huang, Chang ;
Wei, Yunchao ;
Liu, Wenyu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :603-612
[8]  
Hwang S, 2015, PROC CVPR IEEE, P1037, DOI 10.1109/CVPR.2015.7298706
[9]   GAF-Net: Improving the Performance of Remote Sensing Image Fusion using Novel Global Self and Cross Attention Learning [J].
Jha, Ankit ;
Bose, Shirsha ;
Banerjee, Biplab .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :6343-6352
[10]  
Kim B., 2022, 2022 IEEE INT C CONS, P1