Multiscale Cascaded Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

被引:0
作者
Zhang, Xiaolu [1 ]
Wang, Zhaoshun [1 ]
Wei, Anlei [2 ,3 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing, Peoples R China
[2] China Unicom Digital Technol Co Ltd, Syst Integrat Business Dept, Beijing, Peoples R China
[3] Southeast Univ, Sch Cyber Sci Engn, Nanjing, Peoples R China
关键词
ATTENTION;
D O I
10.1080/07038992.2023.2255068
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
As remote sensing images have complex backgrounds and varying object sizes, their semantic segmentation is challenging. This study proposes a multiscale cascaded network (MSCNet) for semantic segmentation. The resolutions employed with respect to the input remote sensing images are 1, 1/2, and 1/4, which represent high, medium, and low resolutions. First, 3 backbone networks extract features with different resolutions. Then, using a multiscale attention network, the fused features are input into the dense atrous spatial pyramid pooling network to obtain multiscale information. The proposed MSCNet introduces multiscale feature extraction and attention mechanism modules suitable for remote sensing land-cover classification. Experiments are performed using the Deepglobe, Vaihingen, and Potsdam datasets; the results are compared with those of the existing classical semantic segmentation networks. The findings indicate that the mean intersection over union (mIoU) of the MSCNet is 4.73% higher than that of DeepLabv3+ with the Deepglobe datasets. For the Vaihingen datasets, the mIoU of the MSCNet is 15.3%, and 6.4% higher than those of a segmented network (SegNet), and DeepLabv3+, respectively. For the Potsdam datasets, the mIoU of the MSCNet is higher than those of a fully convolutional network, Res-U-Net, SegNet, and DeepLabv3+ by 11.18%, 5.89%, 4.78%, and 3.03%, respectively. Comme les images de teledetection ont des arriere-plans complexes et des tailles d'objets variables, leur segmentation semantique est difficile. Cette etude propose un reseau multi-echelle en cascade (MSCNet) pour la segmentation semantique. Les resolutions utilisees par rapport aux images de teledetection d'entree sont 1, 1/2, et 1/4, representant les resolutions haute, moyenne et basse. Tout d'abord, trois reseaux federateurs extraient les caracteristiques avec des resolutions differentes. Ensuite, a l'aide d'un reseau d'attention multi-echelle, les caracteristiques fusionnees sont entrees dans le reseau de mise en commun des pyramides spatiales denses et a trous pour obtenir des informations multi-echelles. Le MSCNet propose introduit des modules multi-echelles d'extraction de caracteristiques et de mecanismes d'attention adaptes a la classification de la couverture terrestre par teledetection. Les experiences sont realisees a l'aide des ensembles de donnees Deepglobe, Vaihingen et Potsdam. Les resultats sont compares a ceux des reseaux de segmentation semantique classique existants. Les resultats indiquent que l'intersection moyenne sur l'union (mIoU) du MSCNet est superieure par 4,73% a celle de DeepLabv3+ avec les ensembles de donnees Deepglobe. Pour les jeux de donnees Vaihingen, le mIoU du MSCNet est superieur par 15,3% a celui d'un reseau segmente (SegNet) et par 6,4% a celui de DeepLabv3+. Pour les donnees de Potsdam, le mIoU du MSCNet est superieur a ceux du reseau entierement convolutif, de Res-U-Net, de SegNet et de DeepLabv3+ par 11,18%, 5,89%, 4,78%, et 3,03%, respectivement.
引用
收藏
页数:12
相关论文
共 25 条
  • [1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [2] Chen LC., 2014, SEMANTIC IMAGE SEGME, DOI DOI 10.48550/ARXIV.1412.7062
  • [3] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [5] Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
  • [6] Chen S., 2021, INT C PION COMP SCI, P303
  • [7] Demir I, 2018, Arxiv, DOI [arXiv:1805.06561, DOI arXiv:1805.06561.null, DOI 10.18550/ARXIV.1805.06561]
  • [8] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
  • [9] Multilevel Adaptive-Scale Context Aggregating Network for Semantic Segmentation in High-Resolution Remote Sensing Images
    Li, Xiao
    Lei, Lin
    Kuang, Gangyao
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [10] Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965