Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation

被引：8

作者：

Mo, Youda ^{[1
]}

Li, Huihui ^{[2
,3
]}

Xiao, Xiangling ^{[1
]}

Zhao, Huimin ^{[1
]}

Liu, Xiaoyong

Zhan, Jin ^{[1
]}

机构：

[1] Guangdong Polytech Normal Univ, Sch Comp Sci, Guangzhou 510665, Peoples R China

[2] Guangdong Polytech Normal Univ, Sch Comp Sci, Guangzhou 510665, Peoples R China

[3] Guangdong Polytech Normal Univ, Guangdong Prov Key Lab Intellectual Property & Big, Guangzhou 510665, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2023年 / 16卷

基金：

中国国家自然科学基金;

关键词：

Global local transformer block (GLTB); remote sensing (RS) image; semantic segmentation; Swin transformer; Swin-Conv-Dspp (SCD);

D O I：

10.1109/JSTARS.2023.3280365

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

compared with the traditional method based on hand-crafted features, deep neural network has achieved a certain degree of success on remote sensing (RS) image semantic segmentation. However, there are still serious holes, rough edge segmentation, and false detection or even missed detection due to the light and its shadow in the segmentation. Aiming at the above problems, this article proposes a RS semantic segmentation model SCG-TransNet that is a hybrid model of Swin transformer and Deeplabv3+, which includes Swin-Conv-Dspp (SCD) and global local transformer block (GLTB). First, the SCD module which can efficiently extract feature information from objects at different scales is used to mitigate the hole phenomenon, reducing the loss of detailed information. Second, we construct a GLTB with spatial pyramid pooling shuffle module to extract critical detail information from the limited visible pixels of the occluded objects, which alleviates the problem of difficult object recognition due to occlusion effectively. Finally, the experimental results show that our SCG-TransNet achieves a mean intersection over union of 70.29% on the Vaihingen datasets, which is 3% higher than the baseline model. It also achieved good results on POSDAM datasets. These demonstrate the effectiveness, robustness, and superiority of our proposed method compared with existing state-of-the-art methods.

引用

页码：5284 / 5296

页数：13

共 67 条

[1] Semantic Segmentation of High-Resolution Airborne Images with Dual-Stream DeepLabV3+ [J].

Akcay, Ozgun ;

Kinaci, Ahmet Cumhur ;

Avsar, Emin Ozgur ;

Aydar, Umut .

ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (01)

[2] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[3] A Deep Learning Approach to an Enhanced Building Footprint and Road Detection in High-Resolution Satellite Imagery [J].

Ayala, Christian ;

Sesma, Ruben ;

Aranda, Carlos ;

Galar, Mikel .

REMOTE SENSING, 2021, 13 (16)

[4] TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+for Medical Image Segmentation [J].

Azad, Reza ;

Heidari, Moein ;

Shariatnia, Moein ;

Aghdam, Ehsan Khodapanah ;

Karimijafarbigloo, Sanaz ;

Adeli, Ehsan ;

Merhof, Dorit .

PREDICTIVE INTELLIGENCE IN MEDICINE (PRIME 2022), 2022, 13564 :91-102

[5] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[6] Semantic scene segmentation in unstructured environment with modified DeepLabV3+ [J].

Baheti, Bhakti ;

Innani, Shubham ;

Gajre, Suhas ;

Talbar, Sanjay .

PATTERN RECOGNITION LETTERS, 2020, 138 :223-229

[7]

Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9

[8] Building Extraction and Number Statistics in WUI Areas Based on UNet Structure and Ensemble Learning [J].

Chen, De-Yue ;

Peng, Ling ;

Li, Wei-Chao ;

Wang, Yin-Da .

REMOTE SENSING, 2021, 13 (06)

[9] SwinSTFM: Remote Sensing Spatiotemporal Fusion Using Swin Transformer [J].

Chen, Guanyu ;

Jiao, Peng ;

Hu, Qing ;

Xiao, Linjie ;

Ye, Zijian .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[10]

Chen J., 2021, arXiv

← 1 2 3 4 5 6 7 →