GTMFuse: Group-attention transformer-driven multiscale dense feature-enhanced network for infrared and visible image fusion

被引:16
|
作者
Mei, Liye [1 ,2 ]
Hu, Xinglong [1 ]
Ye, Zhaoyi [1 ]
Tang, Linfeng [3 ]
Wang, Ying [4 ]
Li, Di [1 ]
Liu, Yan [4 ]
Hao, Xin [5 ]
Lei, Cheng [2 ]
Xu, Chuan [1 ]
Yang, Wei [4 ,6 ]
机构
[1] Hubei Univ Technol, Sch Comp Sci, Wuhan 430068, Peoples R China
[2] Wuhan Univ, Inst Technol Sci, Wuhan 430072, Peoples R China
[3] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China
[4] Wuchang Shouyi Univ, Sch Informat Sci & Engn, Wuhan 430064, Peoples R China
[5] Antgroup, Hangzhou 310020, Peoples R China
[6] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan 430072, Peoples R China
关键词
Infrared and visible image fusion; Deep learning; Group; -attention; Multiscale feature;
D O I
10.1016/j.knosys.2024.111658
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Infrared and visible images captured by different devices can be seamlessly integrated into a single composite image through the application of image fusion techniques. However, many existing convolutional neural network-based methods for infrared and visible image fusion have exhibited limited capability for effectively amalgamating information from the source images. Consequently, we propose a group-attention transformer into the multiscale feature enhanced network for infrared and visible image fusion, which we abbreviate as GTMFuse. Specifically, GTMFuse employs multiscale dual-channel encoders to independently process the source image and extract multiscale features. Among the encoders, the group-attention transformer module is utilized to facilitate more comprehensive long-range feature dependency modeling at each scale. This innovative module seamlessly combines a fixed-direction stripe attention mechanism with channel attention and window attention, enabling comprehensive global long-range information capture and interaction with feature information across the source images. The multiscale features obtained from the group-attention transformer module are integrated into the fused image through a meticulously designed dense fusion block. Furthermore, this study introduces a novel dataset named HBUT-IV, encompassing surveillance images captured from multiple viewpoints. The HBUT-IV dataset serves as a valuable benchmark for assessing the efficacy of fusion methods. Extensive experiments are conducted on four datasets employing nine comparative methods, revealing the superior performance of the GTMFuse approach. The implementation code is accessible at https://github.com/XingLongH/GTMFuse.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Multiscale channel attention network for infrared and visible image fusion
    Zhu, Jiahui
    Dou, Qingyu
    Jian, Lihua
    Liu, Kai
    Hussain, Farhan
    Yang, Xiaomin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (22):
  • [2] MAFusion: Multiscale Attention Network for Infrared and Visible Image Fusion
    Li, Xiaoling
    Chen, Houjin
    Li, Yanfeng
    Peng, Yahui
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [3] Accurate Vision-Enabled UAV Location Using Feature-Enhanced Transformer-Driven Image Matching
    Wang, Haoyang
    Zhou, Fuhui
    Wu, Qihui
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 11
  • [4] Multiscale feature learning and attention mechanism for infrared and visible image fusion
    Li Gao
    DeLin Luo
    Song Wang
    Science China Technological Sciences, 2024, 67 : 408 - 422
  • [5] Multiscale feature learning and attention mechanism for infrared and visible image fusion
    Gao, Li
    Luo, Delin
    Wang, Song
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2024, 67 (02) : 408 - 422
  • [6] Multiscale feature learning and attention mechanism for infrared and visible image fusion
    GAO Li
    LUO DeLin
    WANG Song
    Science China(Technological Sciences), 2024, 67 (02) : 408 - 422
  • [7] MFAGAN: A multiscale feature-attention generative adversarial network for infrared and visible image fusion
    Tang, Xuanji
    Zhao, Jufeng
    Cui, Guangmang
    Tian, Haijun
    Shi, Zhen
    Hou, Changlun
    INFRARED PHYSICS & TECHNOLOGY, 2023, 133
  • [8] Transformer-driven feature fusion network and visual feature coding for multi-label image classification
    Liu, Pingzhu
    Qian, Wenbin
    Huang, Jintao
    Tu, Yanqiang
    Cheung, Yiu-Ming
    PATTERN RECOGNITION, 2025, 164
  • [9] A Dual Cross Attention Transformer Network for Infrared and Visible Image Fusion
    Zhou, Zhuozhi
    Lan, Jinhui
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 494 - 499
  • [10] MATCNN: Infrared and Visible Image Fusion Method Based on Multiscale CNN With Attention Transformer
    Liu, Jingjing
    Zhang, Li
    Zeng, Xiaoyang
    Liu, Wanquan
    Zhang, Jianhua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74