Hybrid Shunted Transformer embedding UNet for remote sensing image semantic segmentation

被引:3
作者
Zhou H. [1 ,5 ]
Xiao X. [1 ]
Li H. [1 ,2 ]
Liu X. [3 ,4 ]
Liang P. [1 ]
机构
[1] School of Computer Science, Guangdong Polytechnic Normal University, Guangdong, Guangzhou
[2] Guangdong Provincial Key Laboratory of Intellectual Property and Big Data, Guangdong Polytechnic Normal University, Guangdong, Guangzhou
[3] School of Data Science and Engineering, Guangdong Polytechnic Normal University, Guangdong, Guangzhou
[4] Academy of Heyuan, Guangdong Polytechnic Normal University, Guangdong, Heyuan
[5] The Cyberspace Institute of Advanced Technology, Guangzhou University, Guangdong, Guangzhou
基金
中国国家自然科学基金;
关键词
Multi-scale; Remote Sensing Image (RSI); Semantic segmentation; Shunted Transformer;
D O I
10.1007/s00521-024-09888-4
中图分类号
学科分类号
摘要
With the development of deep learning, Remote Sensing Image (RSI) semantic segmentation has produced significant advances. However, due to the sparse distribution of the objects and the high similarity between classes, the task of semantic segmentation in RSI is still extremely challenging. In this paper, we propose a novel semantic segmentation framework for RSI called HST-UNet that can overcome the shortcomings of the existing models and extract and recover the global and local features of RSI, which is a hybrid semantic segmentation model with Shunted Transformer as encoder and Multi-Scale Convolutional Attention Network (MSCAN) as decoder. Then, to better fuse the information from the Encoder and the Decoder and alleviate the ambiguity, we design a Learnable Weighted Fusion (LWF) module to effectively connect to the decoder features. Extensive experiments demonstrate that the proposed HST-UNet outperforms the state-of-the-art methods, achieving F1 score/MIoU accuracy of 71.44%/83.00% on the ISPRS Vaihingen dataset and 77.36%/87.09% on ISPRS Potsdam dataset. The code will be available at https://github.com/HC-Zhou/HST-UNet. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
引用
收藏
页码:15705 / 15720
页数:15
相关论文
共 55 条
[1]  
Luo H., Chen C., Fang L., Khoshelham K., Shen G., Ms-rrfsegnet: multiscale regional relation feature segmentation network for semantic segmentation of urban scene point clouds, IEEE Trans Geosci Remote Sens, 58, 12, pp. 8301-8315, (2020)
[2]  
Neupane B., Horanont T., Aryal J., Deep learning-based semantic segmentation of urban features in satellite images: a review and meta-analysis, Remote Sens, 13, 4, (2021)
[3]  
Yuhua C., Wen L., Luc V.G., Road: Reality oriented adaptation for semantic segmentation of urban scenes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7892-7901, (2018)
[4]  
Ji W., Xiaofan X., Murambadoro D., Understanding urban wetland dynamics: cross-scale detection and analysis of remote sensing, Int J Remote Sens, 36, 7, pp. 1763-1788, (2015)
[5]  
Granholm A.-H., Lindgren N., Olofsson K., Nystrom M., Allard A., Olsson H., Estimating vertical canopy cover using dense image-based point cloud data in four vegetation types in southern sweden, Int J Remote Sens, 38, 7, pp. 1820-1838, (2017)
[6]  
Shahbazi M., Theau J., Menard P., Recent applications of unmanned aerial imagery in natural resource management, GISci Remote Sens, 51, 4, pp. 339-365, (2014)
[7]  
Clarke J.D.A., Gibson D., Apps H., The use of lidar in applied interpretive landform mapping for natural resource management, murray river alluvial plain, australia, Int J Remote Sens, 31, 23, pp. 6275-6296, (2010)
[8]  
Weber E., Kane H., Building disaster damage assessment in satellite imagery with multi-temporal fusion, (2020)
[9]  
Chen W.-J., Li C.-C., Rain retrievals using tropical rainfall measuring mission and geostationary meteorological satellite 5 data obtained during the scsmex, Int J Remote Sens, 23, 12, pp. 2425-2448, (2002)
[10]  
Kaiming H., Xiangyu Z., Shaoqing R., Jian S., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)