THFuse: An infrared and visible image fusion network using transformer and hybrid feature extractor

被引：49

作者：

Chen, Jun ^{[1
,2
,3
]}

Ding, Jianfeng ^{[1
,2
,3
]}

Yu, Yang ^{[4
,5
]}

Gong, Wenping ^{[6
]}

机构：

[1] China Univ Geosci, Sch Automat, Wuhan 430074, Peoples R China

[2] Hubei Key Lab Adv Control & Intelligent Automat Co, Wuhan 430074, Peoples R China

[3] Minist Educ, Engn Res Ctr Intelligent Technol Geoexplorat, Wuhan 430074, Peoples R China

[4] Chinese Acad Sci, Shanghai Inst Tech Phys, Shanghai 200083, Peoples R China

[5] Chinese Acad Sci, Key Lab Infrared Syst Detecting & Imaging Technol, Shanghai 200083, Peoples R China

[6] China Univ Geosci, Fac Engn, Wuhan 430074, Peoples R China

来源：

NEUROCOMPUTING | 2023年 / 527卷

基金：

中国国家自然科学基金;

关键词：

Image fusion; Vision transformer; Infrared image; Visible image; Deep learning; INFORMATION; NEST;

D O I：

10.1016/j.neucom.2023.01.033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Infrared and visible image fusion aims to integrate complementary information from different types of images into one image. The existing image fusion methods are primarily based on convolutional neural network (CNN), which ignores long-range dependencies of images, resulting in the fusion network unable to generate images with good complementarity. Inspired by the importance of global information, we introduced the transformer technique into the CNN-based fusion network as a way to improve the entire image-level perception in complex fusion scenarios. In this paper, we propose an end-to-end image fusion framework based on transformer and hybrid feature extractor, which enables the network to focus on both global and local information, using the characteristics of transformer to compensate for the shortcomings of CNN itself. In our network, the dual-branch CNN module is used to extract the shallow features of images, and then the vision transformer module is used to obtain the global channel and spatial relationship in the features. Finally, the fusion results are obtained through the image reconstruction module. We calculate the loss in the features of different depths according to the different kinds of original images by using the pre-trained VGG19 network. The experimental results show the effectiveness of adding the vision transformer module. Compared with other traditional and deep learning methods, our method achieves state-of-the-art qualitative and quantitative experiments performance.(c) 2023 Elsevier B.V. All rights reserved.

引用

页码：71 / 82

页数：12

共 68 条

[1]

Alexander T., 2014, TNO Image Fusion Dataset

[2]

[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4

[3] A new image quality metric for image fusion: The sum of the correlations of differences [J].

Aslantas, V. ;

Bendes, E. .

AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2015, 69 (12) :160-166

[4] Directive Contrast Based Multimodal Medical Image Fusion in NSCT Domain [J].

Bhatnagar, Gaurav ;

Wu, Q. M. Jonathan ;

Liu, Zheng .

IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (05) :1014-1024

[5] Pedestrian detection with unsupervised multispectral feature learning using deep neural networks [J].

Cao, Yanpeng ;

Guan, Dayan ;

Huang, Weilin ;

Yang, Jiangxin ;

Cao, Yanlong ;

Qiao, Yu .

INFORMATION FUSION, 2019, 46 :206-217

[6] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[7] Optimum spectrum mask based medical image fusion using Gray Wolf Optimization [J].

Daniel, Ebenezer ;

Anitha, J. ;

Kamaleshwaran, K. K. ;

Rani, Indu .

BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2017, 34 :36-43

[8]

Dosovitskiy A., 2021, 2021 INT C LEARN REP, P2021

[9] A light-weight, efficient, and general cross-modal image fusion network [J].

Fang, Aiqing ;

Zhao, Xinbo ;

Yang, Jiaqi ;

Qin, Beibei ;

Zhang, Yanning .

NEUROCOMPUTING, 2021, 463 :198-211

[10] A Dual-branch Network for Infrared and Visible Image Fusion [J].

Fu, Yu ;

Wu, Xiao-Jun .

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :10675-10680

← 1 2 3 4 5 6 7 →