TFIV: Multigrained Token Fusion for Infrared and Visible Image via Transformer

被引:9
|
作者
Li, Jing [1 ]
Yang, Bin [2 ]
Bai, Lu [3 ,4 ]
Dou, Hao [5 ]
Li, Chang [6 ]
Ma, Lingfei [7 ]
机构
[1] Cent Univ Finance & Econ, Sch Informat, Beijing 102206, Peoples R China
[2] Hunan Univ, Coll Elect & Informat Engn, Changsha 410082, Peoples R China
[3] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[4] Cent Univ Finance & Econ, Beijing 100081, Peoples R China
[5] China Elect Technol Grp Corp, Res Inst 38, Hefei 230088, Peoples R China
[6] Hefei Univ Technol, Dept Biomed Engn, Hefei 230009, Peoples R China
[7] Cent Univ Finance & Econ, Sch Stat & Math, Beijing 102206, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; infrared image; transformer; visible image; MULTI-FOCUS; NETWORK; FRAMEWORK;
D O I
10.1109/TIM.2023.3312755
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The existing transformer-based infrared and visible image fusion methods mainly focus on the self-attention correlation existing in the intra-modal of each image; yet these methods neglect the discrepancies of inter-modal in the same position of two source images, because the information of infrared token and visible token in the same position is unbalanced. Therefore, we develop a pure transformer fusion model to reconstruct fused image in token dimension, which not only perceives the long-range dependencies in intra-modal by self-attention mechanism of the transformer, but also captures the attentive correlation of inter-modal in token space. Moreover, to enhance and balance the interaction of inter-modal tokens when we fuse the corresponding infrared and visible tokens, learnable attentive weights are applied to dynamically measure the correlation of inter-modal tokens in the same position. Concretely, infrared and visible tokens are first calculated by two independent transformers to extract long-range dependencies in intra-modal due to their modal difference. Then, we fuse the corresponding infrared and visible tokens of inter-modal in token space to reconstruct the fused image. In addition, to comprehensively extract multiscale long-range dependencies and capture attentive correlation of corresponding multimodal tokens in different token sizes, we explore and extend the fusion to multigrained token-based fusion. Ablation studies and extensive experiments illustrate the effectiveness and superiorities of our model when compared with nine state-of-the-art methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Infrared and visible image fusion via detail preserving adversarial learning
    Ma, Jiayi
    Liang, Pengwei
    Yu, Wei
    Chen, Chen
    Guo, Xiaojie
    Wu, Jia
    Jiang, Junjun
    INFORMATION FUSION, 2020, 54 : 85 - 98
  • [42] Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion
    Wang, Lei
    Hu, Ziming
    Kong, Quan
    Qi, Qian
    Liao, Qing
    ENTROPY, 2023, 25 (03)
  • [43] Infrared and Visible Image Fusion via Multiscale Receptive Field Amplification Fusion Network
    Ji, Chuanming
    Zhou, Wujie
    Lei, Jingsheng
    Ye, Lv
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 493 - 497
  • [44] Adjustable Visible and Infrared Image Fusion
    Wu, Boxiong
    Nie, Jiangtao
    Wei, Wei
    Zhang, Lei
    Zhang, Yanning
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13463 - 13477
  • [45] RESTORABLE VISIBLE AND INFRARED IMAGE FUSION
    Kang, Jihun
    Horita, Daichi
    Tsubota, Koki
    Aizawa, Kiyoharu
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1560 - 1564
  • [46] AFT: Adaptive Fusion Transformer for Visible and Infrared Images
    Chang, Zhihao
    Feng, Zhixi
    Yang, Shuyuan
    Gao, Quanwei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2077 - 2092
  • [47] HitFusion: Infrared and Visible Image Fusion for High-Level Vision Tasks Using Transformer
    Chen, Jun
    Ding, Jianfeng
    Ma, Jiayi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10145 - 10159
  • [48] SAM-guided multi-level collaborative Transformer for infrared and visible image fusion
    Guo, Lin
    Luo, Xiaoqing
    Liu, Yue
    Zhang, Zhancheng
    Wu, Xiaojun
    PATTERN RECOGNITION, 2025, 162
  • [49] An infrared and visible image fusion using knowledge measures for intuitionistic fuzzy sets and Swin Transformer
    Khan, Muhammad Jabir
    Jiang, Shu
    Ding, Weiping
    Huang, Jiashuang
    Wang, Haipeng
    INFORMATION SCIENCES, 2024, 648
  • [50] TBRAFusion: Infrared and visible image fusion based on two-branch residual attention Transformer
    Zhang, Wangwei
    Sun, Hao
    Zhou, Bin
    ELECTRONIC RESEARCH ARCHIVE, 2024, 33 (01): : 158 - 180