Multiscale Cascaded Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

被引：0

作者：

Zhang, Xiaolu ^{[1
]}

Wang, Zhaoshun ^{[1
]}

Wei, Anlei ^{[2
,3
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing, Peoples R China

[2] China Unicom Digital Technol Co Ltd, Syst Integrat Business Dept, Beijing, Peoples R China

[3] Southeast Univ, Sch Cyber Sci Engn, Nanjing, Peoples R China

来源：

CANADIAN JOURNAL OF REMOTE SENSING | 2023年 / 49卷 / 01期

关键词：

ATTENTION;

D O I：

10.1080/07038992.2023.2255068

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

As remote sensing images have complex backgrounds and varying object sizes, their semantic segmentation is challenging. This study proposes a multiscale cascaded network (MSCNet) for semantic segmentation. The resolutions employed with respect to the input remote sensing images are 1, 1/2, and 1/4, which represent high, medium, and low resolutions. First, 3 backbone networks extract features with different resolutions. Then, using a multiscale attention network, the fused features are input into the dense atrous spatial pyramid pooling network to obtain multiscale information. The proposed MSCNet introduces multiscale feature extraction and attention mechanism modules suitable for remote sensing land-cover classification. Experiments are performed using the Deepglobe, Vaihingen, and Potsdam datasets; the results are compared with those of the existing classical semantic segmentation networks. The findings indicate that the mean intersection over union (mIoU) of the MSCNet is 4.73% higher than that of DeepLabv3+ with the Deepglobe datasets. For the Vaihingen datasets, the mIoU of the MSCNet is 15.3%, and 6.4% higher than those of a segmented network (SegNet), and DeepLabv3+, respectively. For the Potsdam datasets, the mIoU of the MSCNet is higher than those of a fully convolutional network, Res-U-Net, SegNet, and DeepLabv3+ by 11.18%, 5.89%, 4.78%, and 3.03%, respectively. Comme les images de teledetection ont des arriere-plans complexes et des tailles d'objets variables, leur segmentation semantique est difficile. Cette etude propose un reseau multi-echelle en cascade (MSCNet) pour la segmentation semantique. Les resolutions utilisees par rapport aux images de teledetection d'entree sont 1, 1/2, et 1/4, representant les resolutions haute, moyenne et basse. Tout d'abord, trois reseaux federateurs extraient les caracteristiques avec des resolutions differentes. Ensuite, a l'aide d'un reseau d'attention multi-echelle, les caracteristiques fusionnees sont entrees dans le reseau de mise en commun des pyramides spatiales denses et a trous pour obtenir des informations multi-echelles. Le MSCNet propose introduit des modules multi-echelles d'extraction de caracteristiques et de mecanismes d'attention adaptes a la classification de la couverture terrestre par teledetection. Les experiences sont realisees a l'aide des ensembles de donnees Deepglobe, Vaihingen et Potsdam. Les resultats sont compares a ceux des reseaux de segmentation semantique classique existants. Les resultats indiquent que l'intersection moyenne sur l'union (mIoU) du MSCNet est superieure par 4,73% a celle de DeepLabv3+ avec les ensembles de donnees Deepglobe. Pour les jeux de donnees Vaihingen, le mIoU du MSCNet est superieur par 15,3% a celui d'un reseau segmente (SegNet) et par 6,4% a celui de DeepLabv3+. Pour les donnees de Potsdam, le mIoU du MSCNet est superieur a ceux du reseau entierement convolutif, de Res-U-Net, de SegNet et de DeepLabv3+ par 11,18%, 5,89%, 4,78%, et 3,03%, respectivement.

引用

页数：12

共 25 条

[1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[2] Chen LC., 2014, SEMANTIC IMAGE SEGME, DOI DOI 10.48550/ARXIV.1412.7062
[3] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[5] Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[6] Chen S., 2021, INT C PION COMP SCI, P303
[7] Demir I, 2018, Arxiv, DOI [arXiv:1805.06561, DOI arXiv:1805.06561.null, DOI 10.18550/ARXIV.1805.06561]
[8] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[9] Multilevel Adaptive-Scale Context Aggregating Network for Semantic Segmentation in High-Resolution Remote Sensing Images
Li, Xiao
Lei, Lin
Kuang, Gangyao
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[10] Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965

← 1 2 3 →