STFMamba: Spatiotemporal satellite image fusion network based on visual state space model

被引:0
作者
Zhao, Min [1 ]
Jiang, Xiaolu [2 ,3 ]
Huang, Bo [1 ]
机构
[1] Univ Hong Kong, Dept Geog, Hong Kong 999077, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Ocean Sci, Hong Kong 999077, Peoples R China
[3] Hong Kong Univ Sci & Technol, Ctr Ocean Res Hong Kong & Macau, Hong Kong 999077, Peoples R China
基金
中国国家自然科学基金;
关键词
Remote sensing; Satellite image; Spatiotemporal fusion; Deep learning; State space model; Cross attention; knowledge distillation; REFLECTANCE FUSION; NEURAL-NETWORK; FRAMEWORK;
D O I
10.1016/j.isprsjprs.2025.07.011
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Remote sensing images provide extensive information about Earth's surface, supporting a wide range of applications. Individual sensors often encounter a trade-off between spatial and temporal resolutions, spatiotemporal fusion (STF) aims to overcome this shortcoming by combining multisource data. Existing deep learning-based STF methods struggle with capturing long-range dependencies (CNN-based) or incur high computational cost (Transformer-based). To overcome these limitations, we propose STFMamba, a twostep state space model that effectively captures global information while maintaining linear complexity. Specifically, a super-resolution (SR) network is firstly utilized to mitigate sensor heterogeneity of multisource data, then a dual U-Net is designed to fully leverage spatio-temporal correlations and capture temporal variations. Our STFMamba contains the following three key components: 1) the multidimensional scanning mechanism for global relationship modeling to eliminate information loss, 2) a spatio-spectral-temporal fusion scanning strategy to integrate multiscale contextual features, and 3) a multi-head cross-attention module for adaptive selection and fusion. Additionally, we develop a lightweight version of STFMamba for deployment on resource-constrained devices, incorporating a knowledge distillation strategy to align its features with the base model and enhance performance. Extensive experiments on three benchmark datasets demonstrate the superiority of the proposed method. Specifically, our method outperforms compared methods, including FSDAF, FVSDF, EDCSTFN, GANSTFM, SwinSTFM, and DDPMSTF, with average RMSE reductions of 24.25%, 25.94%, 18.15%, 14.36%, 9.63%, and 12.82%, respectively. Our code is available at: https://github.com/zhaomin0101/ STFMamba.
引用
收藏
页码:288 / 304
页数:17
相关论文
共 52 条
[1]   STF-Trans: A two-stream spatiotemporal fusion transformer for very high resolution satellites images [J].
Benzenati, Tayeb ;
Kallel, Abdelaziz ;
Kessentini, Yousri .
NEUROCOMPUTING, 2024, 563
[2]   SwinSTFM: Remote Sensing Spatiotemporal Fusion Using Swin Transformer [J].
Chen, Guanyu ;
Jiao, Peng ;
Hu, Qing ;
Xiao, Linjie ;
Ye, Zijian .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[3]  
Chen HRX, 2024, Arxiv, DOI arXiv:2404.03425
[4]   RSMamba: Biologically Plausible Retinex-Based Mamba for Remote Sensing Shadow Removal [J].
Chi, Kaichen ;
Guo, Sai ;
Chu, Jun ;
Li, Qiang ;
Wang, Qi .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[5]   Assessing the accuracy of blending Landsat-MODIS surface reflectances in two landscapes with contrasting spatial and temporal dynamics: A framework for algorithm selection [J].
Emelyanova, Irina V. ;
McVicar, Tim R. ;
Van Niel, Thomas G. ;
Li, Ling Tao ;
van Dijk, Albert I. J. M. .
REMOTE SENSING OF ENVIRONMENT, 2013, 133 :193-209
[6]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[7]  
Gu A, 2024, Arxiv, DOI arXiv:2312.00752
[8]  
Gu AL, 2022, Arxiv, DOI [arXiv:2111.00396, DOI 10.48550/ARXIV.2111.00396]
[9]  
Han Q., 2021, Indian Competit Law Rev.
[10]   A new data fusion model for high spatial- and temporal-resolution mapping of forest disturbance based on Landsat and MODIS [J].
Hilker, Thomas ;
Wulder, Michael A. ;
Coops, Nicholas C. ;
Linke, Julia ;
McDermid, Greg ;
Masek, Jeffrey G. ;
Gao, Feng ;
White, Joanne C. .
REMOTE SENSING OF ENVIRONMENT, 2009, 113 (08) :1613-1627