ITFuse: An interactive transformer for infrared and visible image fusion

被引:24
作者
Tang, Wei [1 ]
He, Fazhi [1 ]
Liu, Yu [2 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Hefei Univ Technol, Dept Biomed Engn, Hefei 230009, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; Transformer; Interactive network; Infrared image; Deep learning; NETWORK;
D O I
10.1016/j.patcog.2024.110822
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Infrared and visible image fusion (IVIF) has attracted increasing attention from the community because of its pleasing results in downstream applications. However, most existing deep fusion models are either feature-level fusion or image-level fusion, leading to information loss. In this paper, we propose an interactive transformer for IVIF, termed ITFuse. In contrast to previous algorithms, ITFuse consists of feature interactive modules (FIMs) and a feature reconstruction module (FRM) to alternatively extract and integrate important features. Specifically, to adequately exploit the common properties of different source images, we design a residual attention block (RAB) for mutual feature representation. To aggregate the distinct characteristics that existed in the corresponding input images, we leverage interactive attention (ITA) to incorporate the complementary information for comprehensive feature preservation and interaction. In addition, cross-modal attention (CMA) and transformer block (TRB) are presented to fully merge the capitalized features and construct long-range relationships. Furthermore, we devise a pixel loss and a structural loss to train the proposed deep fusion model in an unsupervised manner for excess performance amelioration. Massive experiments on popular databases illustrate that our ITFuse performs better than other representative state-of-the-art methods in terms of both qualitative and quantitative assessments. The source code of the proposed method is available at https://github.com/tthinking/ITFuse.
引用
收藏
页数:13
相关论文
共 46 条
[1]   Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods [J].
Bruhn A. ;
Weickert J. ;
Schnörr C. .
International Journal of Computer Vision, 2005, 61 (3) :1-21
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[4]   Unsupervised person re-identification via multi-domain joint learning [J].
Chen, Feng ;
Wang, Nian ;
Tang, Jun ;
Yan, Pu ;
Yu, Jun .
PATTERN RECOGNITION, 2023, 138
[5]   A human perception inspired quality metric for image fusion based on regional information [J].
Chen, Hao ;
Varshney, Pramod K. .
INFORMATION FUSION, 2007, 8 (02) :193-207
[6]   Image fusion metric based on mutual information and Tsallis entropy [J].
Cvejic, N. ;
Canagarajah, C. N. ;
Bull, D. R. .
ELECTRONICS LETTERS, 2006, 42 (11) :626-627
[7]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[8]   Non-linear and selective fusion of cross-modal images [J].
Fang, Aiqing ;
Zhao, Xinbo ;
Yang, Jiaqi ;
Zhang, Yanning ;
Zheng, Xiang .
PATTERN RECOGNITION, 2021, 119
[9]   Coordinate Attention for Efficient Mobile Network Design [J].
Hou, Qibin ;
Zhou, Daquan ;
Feng, Jiashi .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13708-13717
[10]  
Hwang S, 2015, PROC CVPR IEEE, P1037, DOI 10.1109/CVPR.2015.7298706