DGLT-Fusion: A decoupled global-local infrared and visible image fusion transformer

被引：14

作者：

Yang, Xin ^{[1
]}

Huo, Hongtao ^{[1
]}

Wang, Renhua ^{[1
]}

Li, Chang ^{[2
]}

Liu, Xiaowen ^{[1
]}

Li, Jing ^{[3
]}

机构：

[1] Peoples Publ Secur Univ China, Sch Informat Technol & Cyber Secur, Beijing 100038, Peoples R China

[2] Hefei Univ Technol, Dept Biomed Engn, Hefei 230009, Peoples R China

[3] Cent Univ Finance & Econ, Sch informat, Beijing 100081, Peoples R China

来源：

INFRARED PHYSICS & TECHNOLOGY | 2023年 / 128卷

基金：

中国国家自然科学基金;

关键词：

Infrared image; Visible image; Transformer; Convolution neural networks; Image fusion; NETWORK;

D O I：

10.1016/j.infrared.2022.104522

中图分类号：

TH7 [仪器、仪表];

学科分类号：

0804 ; 080401 ; 081102 ;

摘要：

Convolution Neural Networks (CNN) and generative adversarial networks (GAN) based approaches have achieved substantial performance in image fusion field. However, these methods focus on extracting local features and pay little attention to learning global dependencies. In recent years, given the competitive long-term dependency modeling capability, the Transformer based fusion method has made impressive achievement, but this method simultaneously processes long-term correspondences and short-term features, which might result in deficiently global-local information interaction. Towards this end, we propose a decoupled global- local infrared and visible image fusion Transformer (DGLT-Fusion). The DGLT-Fusion decouples global-local information learning into Transformer module and CNN module. The long-term dependencies are modeled by a series of Transformer blocks (global-decoupled Transformer blocks), while the short-term features are extracted by local-decoupled convolution blocks. In addition, we design Transformer dense connection to reserve more information. These two modules are interweavingly stacked that enables our network retain texture and detailed information more integrally. Furthermore, the comparative experiment results show that DGLT-Fusion achieves better performance than state-of-the-art approaches.

引用

页数：14

共 45 条

[1] [Anonymous], 2020, ARXIV
[2] Chenfei Wang, 2020, 2020 International Conference on Robots & Intelligent System (ICRIS), P80, DOI 10.1109/ICRIS52159.2020.00028
[3] Geng X., 2001, arXiv
[4] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[5] A Survey on Vision Transformer
Han, Kai
Wang, Yunhe
Chen, Hanting
Chen, Xinghao
Guo, Jianyuan
Liu, Zhenhua
Tang, Yehui
Xiao, An
Xu, Chunjing
Xu, Yixing
Yang, Zhaohui
Zhang, Yiman
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
[6] Hu Y., 2021, P 4 INT C PATT REC A, P46
[7] A survey of infrared and visual image fusion methods
Jin, Xin
Jiang, Qian
Yao, Shaowen
Zhou, Dongming
Nie, Rencan
Hai, Jinjin
He, Kangjian
[J]. INFRARED PHYSICS & TECHNOLOGY, 2017, 85 : 478 - 501
[8] Transformers in Vision: A Survey
Khan, Salman
Naseer, Muzammal
Hayat, Munawar
Zamir, Syed Waqas
Khan, Fahad Shahbaz
Shah, Mubarak
[J]. ACM COMPUTING SURVEYS, 2022, 54 (10S)
[9] Le ZL, 2020, IEEE IMAGE PROC, P370, DOI 10.1109/ICIP40778.2020.9191089
[10] Pixel- and region-based image fusion with complex wavelets
Lewis, John J.
O'Callaghan, Robert J.
Nikolov, Stavri G.
Bull, David R.
Canagarajah, Nishan
[J]. INFORMATION FUSION, 2007, 8 (02) : 119 - 130

← 1 2 3 4 5 →