CMFA_Net: A cross-modal feature aggregation network for infrared-visible image fusion

被引:15
作者
Ding, Zhaisheng [1 ]
Li, Haiyan [1 ]
Zhou, Dongming [1 ]
Li, Hongsong [1 ]
Liu, Yanyu [1 ]
Hou, Ruichao [2 ]
机构
[1] Yunnan Univ, Sch Informat, Kunming 650504, Yunnan, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal; Attention mechanism; Image fusion; Unsupervised learning; End-to-end network; Infrared-visible images; PERFORMANCE; FRAMEWORK;
D O I
10.1016/j.infrared.2021.103905
中图分类号
TH7 [仪器、仪表];
学科分类号
0804 ; 080401 ; 081102 ;
摘要
Infrared and visible image fusion is a typical cross-modal information enhancement technology, which aims to fetch the complementary cues from different sensors to reconstruct an informative image or video. Many related works focus on designing the hand-crafted fusion rules and ignore the inner complementarity potentials of modalities, resulting in failing to mine the ability of the deep model thoroughly. In this work, an unsupervised cross-modal feature aggregation network (CMFA_Net) is developed, which explores the latent correlations between the internal characteristics effectively and processes these information to fuse a satisfactory image. Firstly, a densely integrated structure and an attention module are proposed to form a feature extractor. Subsequently, the l1 -norm and the attention mechanism are combined to fuse the affinity features of the cross-modal images. Finally, the fused image is reconstructed by the deconvolution block. To guarantee the clarity and rich information of the fused image, a specific loss function is put forward by utilizing the average pixel decision for structural similarity (SSIM-p) and content-gram variation (CGV) for training the model on the KAIST dataset. Extensive and solid experiments verify the effectiveness and robustness of the proposed model and demonstrate that the proposed method outperforms the state-of-the-arts and achieves advanced performance as well as less computational consumption both in subjective and objective evaluations.
引用
收藏
页数:13
相关论文
共 42 条
  • [1] [Anonymous], 2017, ARXIV170203275
  • [2] Ba Jimmy, 2014, Multiple object recognition with visual attention
  • [3] Fusion of Infrared and Visible Sensor Images Based on Anisotropic Diffusion and Karhunen-Loeve Transform
    Bavirisetti, Durga Prasad
    Dhuli, Ravindra
    [J]. IEEE SENSORS JOURNAL, 2016, 16 (01) : 203 - 209
  • [4] THE LAPLACIAN PYRAMID AS A COMPACT IMAGE CODE
    BURT, PJ
    ADELSON, EH
    [J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (04) : 532 - 540
  • [5] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
    Chen, Long
    Zhang, Hanwang
    Xiao, Jun
    Nie, Liqiang
    Shao, Jian
    Liu, Wei
    Chua, Tat-Seng
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
  • [6] Brain Medical Image Fusion Based on Dual-Branch CNNs in NSST Domain
    Ding, Zhaisheng
    Zhou, Dongming
    Nie, Rencan
    Hou, Ruichao
    Liu, Yanyu
    [J]. BIOMED RESEARCH INTERNATIONAL, 2020, 2020
  • [7] A new image fusion performance metric based on visual information fidelity
    Han, Yu
    Cai, Yunze
    Cao, Yin
    Xu, Xiaoming
    [J]. INFORMATION FUSION, 2013, 14 (02) : 127 - 135
  • [8] Comments on 'Information measure for performance of image fusion'
    Hossny, M.
    Nahavandi, S.
    Creighton, D.
    [J]. ELECTRONICS LETTERS, 2008, 44 (18) : 1066 - U28
  • [9] VIF-Net: An Unsupervised Framework for Infrared and Visible Image Fusion
    Hou, Ruichao
    Zhou, Dongming
    Nie, Rencan
    Liu, Dong
    Xiong, Lei
    Guo, Yanbu
    Yu, Chuanbo
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 : 640 - 651
  • [10] Hu J, 2017, ARXIV PREPRINT ARXIV