Semantic Labeling of High-Resolution Images Using EfficientUNets and Transformers

被引：15

作者：

Almarzouqi, Hasan ^{[1
]}

Saoud, Lyes Saad ^{[2
]}

机构：

[1] Khalifa Univ, Elect Engn & Comp Sci Dept, Abu Dhabi 127788, U Arab Emirates

[2] Khalifa Univ, Mech Engn Dept, Abu Dhabi 127788, U Arab Emirates

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷

关键词：

Transformers; Feature extraction; Remote sensing; Semantics; Semantic segmentation; Image resolution; Data models; Convolutional neural networks (CNNs); EfficientNet; fusion networks; semantic segmentation; transformers; SEGMENTATION; NETWORK; CLASSIFICATION; FOREST;

D O I：

10.1109/TGRS.2023.3268159

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Semantic segmentation necessitates approaches that learn high-level characteristics while dealing with enormous quantities of data. Convolutional neural networks (CNNs) can learn unique and adaptive features to achieve this aim. However, due to the large size and high spatial resolution of remote sensing images, these networks cannot efficiently analyze an entire scene. Recently, deep transformers have proven their capability to record global interactions between different objects in the image. In this article, we propose a new segmentation model that combines CNNs with transformers and show that this mixture of local and global feature extraction techniques provides significant advantages in remote sensing segmentation. In addition, the proposed model includes two fusion layers that are designed to efficiently represent multimodal inputs and outputs of the network. The input fusion layer extracts feature maps summarizing the relationship between image content and elevation maps [digital surface model (DSM)]. The output fusion layer uses a novel multitask segmentation strategy where class labels are identified using class-specific feature extraction layers and loss functions. Finally, a fast-marching method (FMM) is used to convert unidentified class labels into their closest known neighbors. Our results demonstrate that the proposed method improves segmentation accuracy compared with state-of-the-art techniques.

引用

页数：13

共 73 条

[1]

[Anonymous], 2010, STAIR VISION LIB V2

[2] Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks [J].

Audebert, Nicolas ;

Le Saux, Bertrand ;

Lefevre, Sebastien .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 :20-32

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[4] Recurrent Multiresolution Convolutional Networks for VHR Image Classification [J].

Bergado, John Ray ;

Persello, Claudio ;

Stein, Alfred .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (11) :6361-6374

[5] A non-local algorithm for image denoising [J].

Buades, A ;

Coll, B ;

Morel, JM .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2005, :60-65

[6]

Cao H., 2021, arXiv, DOI 10.48550/arXiv:2105.05537

[7] Remote Sensing Image Change Detection With Transformers [J].

Chen, Hao ;

Qi, Zipeng ;

Shi, Zhenwei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[8]

Chen J., 2021, arXiv

[9] LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images [J].

Ding, Lei ;

Tang, Hao ;

Bruzzone, Lorenzo .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (01) :426-435

[10]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

← 1 2 3 4 5 6 7 8 →