AMLCA: Additive multi-layer convolution-guided cross-attention network for visible and infrared image fusion

被引:2
作者
Wang, Dongliang [1 ]
Huang, Chuang [1 ]
Pan, Hao [1 ]
Sun, Yuan [2 ]
Dai, Jian [3 ]
Li, Yanan [3 ]
Ren, Zhenwen [1 ,3 ]
机构
[1] Southwest Univ Sci & Technol, Sch Natl Def Sci & Technol, Mianyang 621010, Peoples R China
[2] Sichuan Univ, Coll Comp Sci, Chengdu 610044, Peoples R China
[3] China South Ind Grp Corp, Southwest Automat Res Inst, Mianyang 621000, Peoples R China
关键词
Image processing; Image fusion; Transformer; Infrared image; Visible image;
D O I
10.1016/j.patcog.2025.111468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal image fusion is widely used in the processing of multispectral signals, e.g., visible and infrared images, which aims to create an information-rich fused image by combining the complementary information from different wavebands. Current fusion methods face significant challenges in extracting complementary information from sensors while simultaneously preserving local details and global dependencies. To address this challenge, we propose an additive multi-layer convolution-guided cross-attention network (AMLCA) for visible and infrared image fusion, which consists of two sub-modals, i.e., additive cross-attention module (ACAM) and wavelet convolution-guided transformer module (WCGTM). Specifically, the former enhances feature interaction and captures global holistic information by using an additive cross-attention mechanism, while the latter relies on wavelet convolution to guide the transformer, enhancing the preservation of details from both sources and improving the extraction of local detail information. Moreover, we propose a multi-layer fusion strategy that leverages hidden complementary features from various layers. Therefore, AMLCA can effectively extracts complementary information from local details and global dependencies, significantly enhancing overall performance. Extensive experiments and ablation analysis on public datasets demonstrate the superiority and effectiveness of AMLCA. The source code is available at https://github.com/Wangdl2000/AMLCA-code.
引用
收藏
页数:11
相关论文
共 40 条
[1]   Dual-YOLO Architecture from Infrared and Visible Images for Object Detection [J].
Bao, Chun ;
Cao, Jie ;
Hao, Qun ;
Cheng, Yang ;
Ning, Yaqian ;
Zhao, Tianhua .
SENSORS, 2023, 23 (06)
[2]  
Bhuvana J, 2024, Meas Sens, V36, DOI [10.1016/j.measen.2024.101309, DOI 10.1016/J.MEASEN.2024.101309]
[3]   Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition [J].
Cui, Guangmang ;
Feng, Huajun ;
Xu, Zhihai ;
Li, Qi ;
Chen, Yueting .
OPTICS COMMUNICATIONS, 2015, 341 :199-209
[4]  
Das M., 2024, Data Analytics for Intelligent Systems: Techniques and Solutions, P1
[5]   Image quality measures and their performance [J].
Eskicioglu, AM ;
Fisher, PS .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1995, 43 (12) :2959-2965
[6]   Wavelet Convolutions for Large Receptive Fields [J].
Finder, Shahaf E. ;
Amoyal, Roy ;
Treister, Eran ;
Freifeld, Oren .
COMPUTER VISION - ECCV 2024, PT LIV, 2025, 15112 :363-380
[7]  
Haibo Zhao, 2021, 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), P71, DOI 10.1109/ICITBE54178.2021.00025
[8]   A new image fusion performance metric based on visual information fidelity [J].
Han, Yu ;
Cai, Yunze ;
Cao, Yin ;
Xu, Xiaoming .
INFORMATION FUSION, 2013, 14 (02) :127-135
[9]   GAF-Net: Improving the Performance of Remote Sensing Image Fusion using Novel Global Self and Cross Attention Learning [J].
Jha, Ankit ;
Bose, Shirsha ;
Banerjee, Biplab .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :6343-6352
[10]   LLVIP: A Visible-infrared Paired Dataset for Low-light Vision [J].
Jia, Xinyu ;
Zhu, Chuang ;
Li, Minzhen ;
Tang, Wenqi ;
Zhou, Wenli .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :3489-3497