SCMFusion: Semantic Constrained Multi-Scale Fusion Network for infrared and visible image fusion

被引:0
作者
Shi, image Liuyan [1 ]
Nie, Rencan [1 ,2 ]
Cao, Jinde [3 ,4 ,5 ]
Liu, Xuheng [1 ]
Li, Xiaoli [1 ]
机构
[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650500, Peoples R China
[2] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[3] Southeast Univ, Sch Math, Nanjing 211189, Peoples R China
[4] Purple Mt Labs, Nanjing 211111, Peoples R China
[5] Ahlia Univ, Manama 10878, Bahrain
基金
中国国家自然科学基金;
关键词
Image fusion; Semantic constraint; Frequency-Aware Bidirectional Pyramid; Cross-Modal Cross-Scale Fusion; Dual-scale transformer; GENERATIVE ADVERSARIAL NETWORK;
D O I
10.1016/j.optlastec.2025.113097
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Due to the distinct characteristics of infrared and visible images, we introduce Semantic Constrained Multi-Scale Fusion (SCMFusion) to balance unique and common features during infrared and visible image fusion (IVIF). This method reduces redundancy and comprehensively represents scenes captured by both modalities. Firstly, the semantic-constrained Frequency-Aware Bidirectional Pyramid (FABP) combines a spatial pyramid, which vertically keeps the channel unchanged and captures a larger receptive field through resolution reduction, with a channel pyramid, which preserves scale consistency and enriches feature expression through increased channels. Subsequently, the extracted features undergo Semantic-Constrained Cross-Modal Cross-Scale Fusion (SC-CCF) for effective information exchange and fusion. Next, the semantic constraints ensure pixel-wise alignment between fused features and original images, integrating modality-specific features and enhancing shared features. Finally, a Reconstruction Block (RB) processes high-and low-frequency components to produce the fused image. Comparative experiments demonstrate that our model outperforms 11 state-of-the-art (SOTA) fusion methods and achieves notable results in object detection.
引用
收藏
页数:14
相关论文
共 74 条
[1]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[2]   Infrared and visible image fusion based on target-enhanced multiscale transform decomposition [J].
Chen, Jun ;
Li, Xuejiao ;
Luo, Linbo ;
Mei, Xiaoguang ;
Ma, Jiayi .
INFORMATION SCIENCES, 2020, 508 :64-78
[3]   MUFusion: A general unsupervised image fusion network based on memory unit [J].
Cheng, Chunyang ;
Xu, Tianyang ;
Wu, Xiao-Jun .
INFORMATION FUSION, 2023, 92 :80-92
[4]   LEFuse: Joint low-light enhancement and image fusion for nighttime infrared and visible images [J].
Cheng, Muhang ;
Huang, Haiyan ;
Liu, Xiangyu ;
Mo, Hongwei ;
Zhao, Xiongbo ;
Wu, Songling .
NEUROCOMPUTING, 2025, 626
[5]   Region-based multimodal image fusion using ICA bases [J].
Cvejic, Nedeljko ;
Bull, David ;
Canagarajah, Nishan .
IEEE SENSORS JOURNAL, 2007, 7 (5-6) :743-751
[6]  
Dosovitskiy A., 2021, P INT C LEARN REPR, DOI [10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]
[7]   Image quality measures and their performance [J].
Eskicioglu, AM ;
Fisher, PS .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1995, 43 (12) :2959-2965
[8]  
Finder S. E., 2024, EUROPEAN C COMPUTER, P363
[9]   Wavelet Convolutions for Large Receptive Fields [J].
Finder, Shahaf E. ;
Amoyal, Roy ;
Treister, Eran ;
Freifeld, Oren .
COMPUTER VISION - ECCV 2024, PT LIV, 2025, 15112 :363-380
[10]  
Ganin Y, 2015, PR MACH LEARN RES, V37, P1180