DiffMamba: semantic diffusion guided feature modeling network for semantic segmentation of remote sensing images

被引:1
作者
Wang, Zhen [1 ,2 ]
Xu, Nan [3 ]
You, Zhuhong [2 ]
Zhang, Shanwen [1 ]
机构
[1] Xijing Univ, Sch Elect Informat, Xian, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Shaanxi, Peoples R China
[3] Hohai Univ, Sch Earth Sci & Engn, Nanjing, Jiangsu, Peoples R China
基金
中国博士后科学基金;
关键词
Diffusion model; state space model (SSM); encoder-decoder framework; remote sensing images; semantic segmentation; CLASSIFICATION;
D O I
10.1080/15481603.2025.2484829
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
With the rapid development of remote sensing technology, the application scope of high-resolution remote sensing images (HR-RSIs) has been continuously expanding. The emergence of convolutional neural networks and Transformer models has significantly enhanced the accuracy of semantic segmentation. However, these methods primarily focus on local feature extraction and long-range dependency modeling of global information, neglecting the spatial correlation of local features, which leads to poor segmentation of small-scale regions. To address this issue, based on Diffusion Model and State Space Model (SSM), we propose a semantic diffusion guided feature modeling network (DiffMamba) for HR-RSI semantic segmentation. DiffMamba uses a hybrid CNNs-Transformer as the encoder structure, and is equipped with the efficient phase sensing module (EPSM), the multi-view transformer module (MVTrans), the semantic diffusion alignment module (SDAM), and the coordinate state space model (CAMamba). EPSM focuses on enhancing local feature representation in the channel dimension, using the phase information of object region features to improve local information interaction and filter out clutter noise interference. MVTrans can observe the spatial location information of the object region from various perspectives to obtain refined global context details. SDAM utilizes the diffusion propagation process to fuse local and global information, alleviating the feature redundancy caused by semantic information differences. CAMamba employs state space transformation to construct the correlation of enhanced local features, and guides the model to achieve feature decoding to obtain refined semantic segmentation results. Extensive experiments on the widely used ISPRS 2-D Semantic Labeling dataset and the 15-Class Gaofen Image dataset confirm the superior efficiency of DiffMamba over several state-of-the-art methods.
引用
收藏
页数:32
相关论文
共 69 条
[1]   A Superpixel-Guided Unsupervised Fast Semantic Segmentation Method of Remote Sensing Images [J].
Chen, Guanzhou ;
He, Chanjuan ;
Wang, Tong ;
Zhu, Kun ;
Liao, Puyun ;
Zhang, Xiaodong .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[2]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[3]   Adaptive Effective Receptive Field Convolution for Semantic Segmentation of VHR Remote Sensing Images [J].
Chen, Xi ;
Li, Zhiqiang ;
Jiang, Jie ;
Han, Zhen ;
Deng, Shiyi ;
Li, Zhihong ;
Fang, Tao ;
Huo, Hong ;
Li, Qingli ;
Liu, Min .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (04) :3532-3546
[4]   DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images [J].
Demir, Ilke ;
Koperski, Krzysztof ;
Lindenbaum, David ;
Pang, Guan ;
Huang, Jing ;
Bast, Saikat ;
Hughes, Forest ;
Tuia, Devis ;
Raskar, Ramesh .
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, :172-181
[5]   Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images [J].
Ding, Lei ;
Lin, Dong ;
Lin, Shaofu ;
Zhang, Jing ;
Cui, Xiaojie ;
Wang, Yuebin ;
Tang, Hao ;
Bruzzone, Lorenzo .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[6]   LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images [J].
Ding, Lei ;
Tang, Hao ;
Bruzzone, Lorenzo .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (01) :426-435
[7]   Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Han, Jungong ;
Ding, Guiguang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965
[8]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[10]  
Gu A, 2024, Arxiv, DOI [arXiv:2312.00752, DOI 10.48550/ARXIV.2312.00752]