Road Extraction from Remote Sensing Images Using a Skip-Connected Parallel CNN-Transformer Encoder-Decoder Model

被引：0

作者：

Gui, Linger ^{[1
]}

Gu, Xingjian ^{[1
]}

Huang, Fen ^{[1
]}

Ren, Shougang ^{[1
]}

Qin, Huanhuan ^{[1
]}

Fan, Chengcheng ^{[2
,3
,4
]}

机构：

[1] Nanjing Agr Univ, Coll Artificial Intelligence, Nanjing 210095, Peoples R China

[2] Chinese Acad Sci, Innovat Acad Microsatellites, Shanghai 201210, Peoples R China

[3] Shanghai Engn Ctr Microsatellites, Shanghai 201210, Peoples R China

[4] Chinese Acad Sci, Key Lab Satellite Digitalizat Technol, Shanghai 201210, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 03期

关键词：

semantic segmentation; road extraction; remote sensing image; Transformer; CNN;

D O I：

10.3390/app15031427

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Extracting roads from remote sensing images holds significant practical value across fields like urban planning, traffic management, and disaster monitoring. Current Convolutional Neural Network (CNN) methods, praised for their robust local feature learning enabled by inductive biases, deliver impressive results. However, they face challenges in capturing global context and accurately extracting the linear features of roads due to their localized receptive fields. To address these shortcomings of traditional methods, this paper proposes a novel parallel encoder architecture that integrates a CNN Encoder Module (CEM) with a Transformer Encoder Module (TEM). The integration combines the CEM's strength in local feature extraction with the TEM's ability to incorporate global context, achieving complementary advantages and overcoming limitations of both Transformers and CNNs. Furthermore, the architecture also includes a Linear Convolution Module (LCM), which uses linear convolutions tailored to the shape and distribution of roads. By capturing image features in four specific directions, the LCM significantly improves the model's ability to detect and represent global and linear road features. Experimental results demonstrate that our proposed method achieves substantial improvements on the German-Street Dataset and the Massachusetts Roads Dataset, increasing the Intersection over Union (IoU) of road class by at least 3% and the overall F1 score by at least 2%.

引用

页数：14

共 41 条

[1]

Abraham L, 2013, 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), P2032, DOI 10.1109/ICACCI.2013.6637494

[2]

Islam MA, 2020, Arxiv, DOI arXiv:2001.08248

[3]

Anil P.N., 2010, P 2 INT C MACH LEARN

[4] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[5]

Chaurasia A, 2017, 2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)

[6]

Chen LC, 2017, Arxiv, DOI arXiv:1706.05587

[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9]

Chu X., 2021, Twins: Revisiting the Design of Spatial Attention in Vision Transformers, DOI [DOI 10.48550/ARXIV.2104.13840, 10.48550/arXiv.2104.13840]

[10]

Dosovitskiy A., 2021, INT C LEARN REPRESEN, P1

← 1 2 3 4 5 →