A Semantic Segmentation Method for Remote Sensing Images Based on the Swin Transformer Fusion Gabor Filter

被引：19

作者：

Feng, Dongdong

Zhang, Zhihua ^{[1
]}

Yan, Kun

机构：

[1] Lanzhou Jiaotong Univ, Fac Geomat, Lanzhou 730070, Peoples R China

来源：

IEEE ACCESS | 2022年 / 10卷

基金：

中国国家自然科学基金;

关键词：

Image segmentation; Feature extraction; Transformers; Remote sensing; Convolution; Semantics; Image edge detection; FAM; Gabor filter; remote sensing; semantic segmentation; Swin transformer; SCENE CLASSIFICATION; ATTENTION; MODEL;

D O I：

10.1109/ACCESS.2022.3193248

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Semantic segmentation of remote sensing images is increasingly important in urban planning, autonomous driving, disaster monitoring, and land cover classification. With the development of high-resolution remote sensing satellite technology, multilevel, large-scale, and high-precision segmentation has become the focus of current research. High-resolution remote sensing images have high intraclass diversity and low interclass separability, which pose challenges to the precision of the detailed representation of multiscale information. In this paper, a semantic segmentation method for remote sensing images based on Swin Transformer fusion with a Gabor filter is proposed. First, a Swin Transformer is used as the backbone network to extract image information at different levels. Then, the texture and edge features of the input image are extracted with a Gabor filter, and the multilevel features are merged by introducing a feature aggregation module (FAM) and an attentional embedding module (AEM). Finally, the segmentation result is optimized with the fully connected conditional random field (FC-CRF). Our proposed method, called Swin-S-GF, its mean Intersection over Union (mIoU) scored 80.14%, 66.50%, and 70.61% on the large-scale classification set, the fine land-cover classification set, and the "AI + Remote Sensing imaging dataset" (AI+RS dataset), respectively. Compared with DeepLabV3, mIoU increased by 0.67%, 3.43%, and 3.80%, respectively. Therefore, we believe that this model provides a good tool for the semantic segmentation of high-precision remote sensing images.

引用

页码：77432 / 77451

页数：20

共 71 条

[1]

[Anonymous], 2017, P 2 INT C EL EL ENG

[2] Vision Transformers for Remote Sensing Image Classification [J].

Bazi, Yakoub ;

Bashmal, Laila ;

Rahhal, Mohamad M. Al ;

Dayil, Reham Al ;

Ajlan, Naif Al .

REMOTE SENSING, 2021, 13 (03) :1-20

[3] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].

Cao, Yue ;

Xu, Jiarui ;

Lin, Stephen ;

Wei, Fangyun ;

Hu, Han .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980

[4] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[6]

Cotter A, 2011, Arxiv, DOI arXiv:1109.4603

[7] Combined wavelet and Gabor convolution neural networks [J].

Dagher, Issam ;

Abujamra, Samir .

INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2019, 17 (06)

[8] ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data [J].

Diakogiannis, Foivos, I ;

Waldner, Francois ;

Caccetta, Peter ;

Wu, Chen .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 162 :94-114

[9]

Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506

[10]

Dosovitskiy A., 2021, P 9 INT C LEARN REPR

← 1 2 3 4 5 6 7 8 →