STAFusion: An Adversarial Learning Network for Infrared and Visible Image Fusion via Swin Transformer

被引：0

作者：

Zhai, Yi ^{[1
,2
]}

Song, Baoping ^{[1
,2
]}

Cheng, Jinyong ^{[1
,2
]}

Dong, Aimei ^{[1
,2
]}

Lv, Guohua ^{[1
,2
]}

机构：

[1] Qilu Univ Technol, Key Lab Comp Power Network & Informat Secur, Minist Educ,Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Shandong Acad Sci, Jinan 250300, Peoples R China

[2] Shandong Prov Key Lab Comp Power Internet & Serv C, Jinan 250300, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2025年

基金：

中国国家自然科学基金;

关键词：

Image fusion; Feature extraction; Transformers; Convolutional neural networks; Semantics; Hands; Deep learning; Transforms; Generative adversarial networks; Adversarial machine learning; Swin Transformer; infrared image; visible image; adversarial learning; MULTI-FOCUS; FRAMEWORK; NEST;

D O I：

10.1109/TETCI.2025.3572144

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The rapidly evolving convolutional neural networks (CNN) have performed well in image fusion tasks. However, the fused scene usually is uneven in image-level perception due to the CNN architecture neglecting the significant influence of long-range dependence. Although Transformer-based fusion techniques can alleviate this challenge, it still has issues with extended sequences, a lack of prior information, and a poor understanding of multi-scale features. To overcome the aforementioned issues, a new framework is proposed for fusing infrared and visible images, which integrates Swin Transformer and adversarial learning, hence termed STAFusion, to provide a flexible hierarchical structure capable of capturing information at multiple scales. In particular, we employ a moving window mechanism to compute self-attention within the window as opposed to the entire image, substantially reducing the length of the series and increasing efficiency. To achieve the capability of global modeling, the shift operation can simultaneously permit interaction between two neighboring windows. This enables cross-window connections between the top and lower levels. The framework incorporates a dual discriminator to ensure that the fused image maintains and improves upon the distinctive modal features of various semantic objects present in the infrared and visible images, enabling an optimal distribution of the information from both sources in the final output. Extensive experiments on public datasets have demonstrated that our STAFusion well retains thermal radiation in infrared images and texture details in visible images. Both the quantitative and qualitative results corroborate the superiority of the proposed approach in comparison to existing image fusion techniques.

引用

页数：15

共 55 条

[1] Principal component analysis [J].

Abdi, Herve ;

Williams, Lynne J. .

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459

[2] THE LAPLACIAN PYRAMID AS A COMPACT IMAGE CODE [J].

BURT, PJ ;

ADELSON, EH .

IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (04) :532-540

[3] Multi-Focus Image Fusion Based on Spatial Frequency in Discrete Cosine Transform Domain [J].

Cao, Liu ;

Jin, Longxu ;

Tao, Hongjiang ;

Li, Guoning ;

Zhuang, Zhuang ;

Zhang, Yanfu .

IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (02) :220-224

[4] Infrared and visible image fusion based on target-enhanced multiscale transform decomposition [J].

Chen, Jun ;

Li, Xuejiao ;

Luo, Linbo ;

Mei, Xiaoguang ;

Ma, Jiayi .

INFORMATION SCIENCES, 2020, 508 :64-78

[5]

Chipman LJ, 1995, INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOLS I-III, pC248

[6] INDEPENDENT COMPONENT ANALYSIS, A NEW CONCEPT [J].

COMON, P .

SIGNAL PROCESSING, 1994, 36 (03) :287-314

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8]

Fu Y, 2022, Arxiv, DOI [arXiv:2107.13967, 10.48550/arXiv.2107.13967, DOI 10.48550/ARXIV.2107.13967]

[9] Image fusion based on generative adversarial network consistent with perception [J].

Fu, Yu ;

Wu, Xiao-Jun ;

Durrani, Tariq .

INFORMATION FUSION, 2021, 72 :110-125

[10] VIF-Net: An Unsupervised Framework for Infrared and Visible Image Fusion [J].

Hou, Ruichao ;

Zhou, Dongming ;

Nie, Rencan ;

Liu, Dong ;

Xiong, Lei ;

Guo, Yanbu ;

Yu, Chuanbo .

IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 :640-651

← 1 2 3 4 5 6 →