SSDT: Scale-Separation Semantic Decoupled Transformer for Semantic Segmentation of Remote Sensing Images

被引：7

作者：

Zheng, Chengyu ^{[1
]}

Jiang, Yanru ^{[1
]}

Lv, Xiaowei ^{[1
]}

Nie, Jie ^{[1
]}

Liang, Xinyue ^{[1
]}

Wei, Zhiqiang ^{[1
]}

机构：

[1] Ocean Univ China, Coll Informat Sci & Engn, Qingdao 266005, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2024年 / 17卷

基金：

中国国家自然科学基金;

关键词：

Semantics; Feature extraction; Transformers; Semantic segmentation; Remote sensing; Computational modeling; Vegetation mapping; Geophysical image processing; geoscience and remote sensing; semantic segmentation;

D O I：

10.1109/JSTARS.2024.3383066

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As we all know, semantic segmentation of remote sensing (RS) images is to classify the images pixel by pixel to realize the semantic decoupling of the images. Most traditional semantic decoupling methods only decouple and do not perform scale-separation operations, which leads to serious problems. In the semantic decoupling process, if the feature extractor is too large, it will ignore the small-scale targets; if the feature extractor is too small, it will lead to the separation of large-scale target objects and reduce the segmentation accuracy. To address this concern, we propose a scale-separated semantic decoupled transformer (SSDT), which first performs scale-separation in the semantic decoupling process and uses the obtained scale information-rich semantic features to guide the Transformer to extract features. The network consists of five modules, scale-separated patch extraction (SPE), semantic decoupled transformer (SDT), scale-separated feature extraction (SFE), semantic decoupling (SD), and multiview feature fusion decoder (MFFD). In particular, SPE turns the original image into a linear embedding sequence of three scales; SD divides pixels into different semantic clusters by K-means, and further obtains scale information-rich semantic features; SDT improves the intraclass compactness and interclass looseness by calculating the similarity between semantic features and image features, the core of which is decoupled attention. Finally, MFFD is proposed to fuse salient features from different perspectives to further enhance the feature representation. Our experiments on two large-scale fine-resolution RS image datasets (Vaihingen and Potsdam) demonstrate the effectiveness of the proposed SSDT strategy in RS image semantic segmentation tasks.

引用

页码：9037 / 9052

页数：16

共 55 条

[1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[2] Albumentations: Fast and Flexible Image Augmentations [J].

Buslaev, Alexander ;

Iglovikov, Vladimir I. ;

Khvedchenya, Eugene ;

Parinov, Alex ;

Druzhinin, Mikhail ;

Kalinin, Alexandr A. .

INFORMATION, 2020, 11 (02)

[3]

Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9

[4]

Chen J., 2021, arXiv

[5]

Chen L.C., 2017, IEEE C COMP VIS PATT

[6]

Chen LC, 2016, Arxiv, DOI arXiv:1412.7062

[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9] CCANet: Class-Constraint Coarse-to-Fine Attentional Deep Network for Subdecimeter Aerial Image Semantic Segmentation [J].

Deng, Guohui ;

Wu, Zhaocong ;

Wang, Chengjun ;

Xu, Miaozhong ;

Zhong, Yanfei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[10] LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images [J].

Ding, Lei ;

Tang, Hao ;

Bruzzone, Lorenzo .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (01) :426-435

← 1 2 3 4 5 6 →