TFIV: Multigrained Token Fusion for Infrared and Visible Image via Transformer

被引：9

作者：

Li, Jing ^{[1
]}

Yang, Bin ^{[2
]}

Bai, Lu ^{[3
,4
]}

Dou, Hao ^{[5
]}

Li, Chang ^{[6
]}

Ma, Lingfei ^{[7
]}

机构：

[1] Cent Univ Finance & Econ, Sch Informat, Beijing 102206, Peoples R China

[2] Hunan Univ, Coll Elect & Informat Engn, Changsha 410082, Peoples R China

[3] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China

[4] Cent Univ Finance & Econ, Beijing 100081, Peoples R China

[5] China Elect Technol Grp Corp, Res Inst 38, Hefei 230088, Peoples R China

[6] Hefei Univ Technol, Dept Biomed Engn, Hefei 230009, Peoples R China

[7] Cent Univ Finance & Econ, Sch Stat & Math, Beijing 102206, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2023年 / 72卷

基金：

中国国家自然科学基金;

关键词：

Image fusion; infrared image; transformer; visible image; MULTI-FOCUS; NETWORK; FRAMEWORK;

D O I：

10.1109/TIM.2023.3312755

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The existing transformer-based infrared and visible image fusion methods mainly focus on the self-attention correlation existing in the intra-modal of each image; yet these methods neglect the discrepancies of inter-modal in the same position of two source images, because the information of infrared token and visible token in the same position is unbalanced. Therefore, we develop a pure transformer fusion model to reconstruct fused image in token dimension, which not only perceives the long-range dependencies in intra-modal by self-attention mechanism of the transformer, but also captures the attentive correlation of inter-modal in token space. Moreover, to enhance and balance the interaction of inter-modal tokens when we fuse the corresponding infrared and visible tokens, learnable attentive weights are applied to dynamically measure the correlation of inter-modal tokens in the same position. Concretely, infrared and visible tokens are first calculated by two independent transformers to extract long-range dependencies in intra-modal due to their modal difference. Then, we fuse the corresponding infrared and visible tokens of inter-modal in token space to reconstruct the fused image. In addition, to comprehensively extract multiscale long-range dependencies and capture attentive correlation of corresponding multimodal tokens in different token sizes, we explore and extend the fusion to multigrained token-based fusion. Ablation studies and extensive experiments illustrate the effectiveness and superiorities of our model when compared with nine state-of-the-art methods.

引用

页数：14

共 50 条

[41] Infrared and visible image fusion via detail preserving adversarial learning
Ma, Jiayi
Liang, Pengwei
Yu, Wei
Chen, Chen
Guo, Xiaojie
Wu, Jia
Jiang, Junjun
INFORMATION FUSION, 2020, 54 : 85 - 98
[42] Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion
Wang, Lei
Hu, Ziming
Kong, Quan
Qi, Qian
Liao, Qing
ENTROPY, 2023, 25 (03)
[43] Infrared and Visible Image Fusion via Multiscale Receptive Field Amplification Fusion Network
Ji, Chuanming
Zhou, Wujie
Lei, Jingsheng
Ye, Lv
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 493 - 497
[44] Adjustable Visible and Infrared Image Fusion
Wu, Boxiong
Nie, Jiangtao
Wei, Wei
Zhang, Lei
Zhang, Yanning
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13463 - 13477
[45] RESTORABLE VISIBLE AND INFRARED IMAGE FUSION
Kang, Jihun
Horita, Daichi
Tsubota, Koki
Aizawa, Kiyoharu
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1560 - 1564
[46] AFT: Adaptive Fusion Transformer for Visible and Infrared Images
Chang, Zhihao
Feng, Zhixi
Yang, Shuyuan
Gao, Quanwei
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2077 - 2092
[47] HitFusion: Infrared and Visible Image Fusion for High-Level Vision Tasks Using Transformer
Chen, Jun
Ding, Jianfeng
Ma, Jiayi
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10145 - 10159
[48] SAM-guided multi-level collaborative Transformer for infrared and visible image fusion
Guo, Lin
Luo, Xiaoqing
Liu, Yue
Zhang, Zhancheng
Wu, Xiaojun
PATTERN RECOGNITION, 2025, 162
[49] An infrared and visible image fusion using knowledge measures for intuitionistic fuzzy sets and Swin Transformer
Khan, Muhammad Jabir
Jiang, Shu
Ding, Weiping
Huang, Jiashuang
Wang, Haipeng
INFORMATION SCIENCES, 2024, 648
[50] TBRAFusion: Infrared and visible image fusion based on two-branch residual attention Transformer
Zhang, Wangwei
Sun, Hao
Zhou, Bin
ELECTRONIC RESEARCH ARCHIVE, 2024, 33 (01): : 158 - 180

← 1 2 3 4 5 →