CMFA_Net: A cross-modal feature aggregation network for infrared-visible image fusion

被引：19

作者：

Ding, Zhaisheng ^{[1
]}

Li, Haiyan ^{[1
]}

Zhou, Dongming ^{[1
]}

Li, Hongsong ^{[1
]}

Liu, Yanyu ^{[1
]}

Hou, Ruichao ^{[2
]}

机构：

[1] Yunnan Univ, Sch Informat, Kunming 650504, Yunnan, Peoples R China

[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

来源：

INFRARED PHYSICS & TECHNOLOGY | 2021年 / 118卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal; Attention mechanism; Image fusion; Unsupervised learning; End-to-end network; Infrared-visible images; PERFORMANCE; FRAMEWORK;

D O I：

10.1016/j.infrared.2021.103905

中图分类号：

TH7 [仪器、仪表];

学科分类号：

0804 ; 080401 ; 081102 ;

摘要：

Infrared and visible image fusion is a typical cross-modal information enhancement technology, which aims to fetch the complementary cues from different sensors to reconstruct an informative image or video. Many related works focus on designing the hand-crafted fusion rules and ignore the inner complementarity potentials of modalities, resulting in failing to mine the ability of the deep model thoroughly. In this work, an unsupervised cross-modal feature aggregation network (CMFA_Net) is developed, which explores the latent correlations between the internal characteristics effectively and processes these information to fuse a satisfactory image. Firstly, a densely integrated structure and an attention module are proposed to form a feature extractor. Subsequently, the l1 -norm and the attention mechanism are combined to fuse the affinity features of the cross-modal images. Finally, the fused image is reconstructed by the deconvolution block. To guarantee the clarity and rich information of the fused image, a specific loss function is put forward by utilizing the average pixel decision for structural similarity (SSIM-p) and content-gram variation (CGV) for training the model on the KAIST dataset. Extensive and solid experiments verify the effectiveness and robustness of the proposed model and demonstrate that the proposed method outperforms the state-of-the-arts and achieves advanced performance as well as less computational consumption both in subjective and objective evaluations.

引用

页数：13

共 42 条

[1]

Alexander T., 2014, TNO Image Fusion Dataset, DOI 10.6084/m9.figshare.1008029.v2

[2]

[Anonymous], 2017, ARXIV170203275

[3]

Ba J., 2014, Multiple object recognition with visual attention

[4] Fusion of Infrared and Visible Sensor Images Based on Anisotropic Diffusion and Karhunen-Loeve Transform [J].

Bavirisetti, Durga Prasad ;

Dhuli, Ravindra .

IEEE SENSORS JOURNAL, 2016, 16 (01) :203-209

[5] THE LAPLACIAN PYRAMID AS A COMPACT IMAGE CODE [J].

BURT, PJ ;

ADELSON, EH .

IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (04) :532-540

[6] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Nie, Liqiang ;

Shao, Jian ;

Liu, Wei ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306

[7] Brain Medical Image Fusion Based on Dual-Branch CNNs in NSST Domain [J].

Ding, Zhaisheng ;

Zhou, Dongming ;

Nie, Rencan ;

Hou, Ruichao ;

Liu, Yanyu .

BIOMED RESEARCH INTERNATIONAL, 2020, 2020

[8] A new image fusion performance metric based on visual information fidelity [J].

Han, Yu ;

Cai, Yunze ;

Cao, Yin ;

Xu, Xiaoming .

INFORMATION FUSION, 2013, 14 (02) :127-135

[9] Comments on 'Information measure for performance of image fusion' [J].

Hossny, M. ;

Nahavandi, S. ;

Creighton, D. .

ELECTRONICS LETTERS, 2008, 44 (18) :1066-U28

[10] VIF-Net: An Unsupervised Framework for Infrared and Visible Image Fusion [J].

Hou, Ruichao ;

Zhou, Dongming ;

Nie, Rencan ;

Liu, Dong ;

Xiong, Lei ;

Guo, Yanbu ;

Yu, Chuanbo .

IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 :640-651

← 1 2 3 4 5 →