A Semantic Segmentation Method for Remote Sensing Images Based on the Swin Transformer Fusion Gabor Filter

被引:19
作者
Feng, Dongdong
Zhang, Zhihua [1 ]
Yan, Kun
机构
[1] Lanzhou Jiaotong Univ, Fac Geomat, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Image segmentation; Feature extraction; Transformers; Remote sensing; Convolution; Semantics; Image edge detection; FAM; Gabor filter; remote sensing; semantic segmentation; Swin transformer; SCENE CLASSIFICATION; ATTENTION; MODEL;
D O I
10.1109/ACCESS.2022.3193248
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic segmentation of remote sensing images is increasingly important in urban planning, autonomous driving, disaster monitoring, and land cover classification. With the development of high-resolution remote sensing satellite technology, multilevel, large-scale, and high-precision segmentation has become the focus of current research. High-resolution remote sensing images have high intraclass diversity and low interclass separability, which pose challenges to the precision of the detailed representation of multiscale information. In this paper, a semantic segmentation method for remote sensing images based on Swin Transformer fusion with a Gabor filter is proposed. First, a Swin Transformer is used as the backbone network to extract image information at different levels. Then, the texture and edge features of the input image are extracted with a Gabor filter, and the multilevel features are merged by introducing a feature aggregation module (FAM) and an attentional embedding module (AEM). Finally, the segmentation result is optimized with the fully connected conditional random field (FC-CRF). Our proposed method, called Swin-S-GF, its mean Intersection over Union (mIoU) scored 80.14%, 66.50%, and 70.61% on the large-scale classification set, the fine land-cover classification set, and the "AI + Remote Sensing imaging dataset" (AI+RS dataset), respectively. Compared with DeepLabV3, mIoU increased by 0.67%, 3.43%, and 3.80%, respectively. Therefore, we believe that this model provides a good tool for the semantic segmentation of high-precision remote sensing images.
引用
收藏
页码:77432 / 77451
页数:20
相关论文
共 71 条
[1]  
[Anonymous], 2017, P 2 INT C EL EL ENG
[2]   Vision Transformers for Remote Sensing Image Classification [J].
Bazi, Yakoub ;
Bashmal, Laila ;
Rahhal, Mohamad M. Al ;
Dayil, Reham Al ;
Ajlan, Naif Al .
REMOTE SENSING, 2021, 13 (03) :1-20
[3]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980
[4]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[5]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[6]  
Cotter A, 2011, Arxiv, DOI arXiv:1109.4603
[7]   Combined wavelet and Gabor convolution neural networks [J].
Dagher, Issam ;
Abujamra, Samir .
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2019, 17 (06)
[8]   ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data [J].
Diakogiannis, Foivos, I ;
Waldner, Francois ;
Caccetta, Peter ;
Wu, Chen .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 162 :94-114
[9]  
Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
[10]  
Dosovitskiy A., 2021, P 9 INT C LEARN REPR