A Framework Integrating DeeplabV3+, Transfer Learning, Active Learning, and Incremental Learning for Mapping Building Footprints

被引：14

作者：

Li, Zhichao ^{[1
]}

Dong, Jinwei ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, Key Lab Land Surface Pattern & Simulat, Beijing 100101, Peoples R China

来源：

REMOTE SENSING | 2022年 / 14卷 / 19期

关键词：

building footprint mapping; DeepLabV3+; active learning; incremental learning; transfer learning; SEMANTIC SEGMENTATION; ANNOTATION; EXTRACTION; IMAGES;

D O I：

10.3390/rs14194738

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Convolutional neural network (CNN)-based remote sensing (RS) image segmentation has become a widely used method for building footprint mapping. Recently, DeeplabV3+, an advanced CNN architecture, has shown satisfactory performance for building extraction in different urban landscapes. However, it faces challenges due to the large amount of labeled data required for model training and the extremely high costs associated with the annotation of unlabelled data. These challenges encouraged us to design a framework for building footprint mapping with fewer labeled data. In this context, the published studies on RS image segmentation are reviewed first, with a particular emphasis on the use of active learning (AL), incremental learning (IL), transfer learning (TL), and their integration for reducing the cost of data annotation. Based on the literature review, we defined three candidate frameworks by integrating AL strategies (i.e., margin sampling, entropy, and vote entropy), IL, TL, and DeeplabV3+. They examine the efficacy of AL, the efficacy of IL in accelerating AL performance, and the efficacy of both IL and TL in accelerating AL performance, respectively. Additionally, these frameworks enable the iterative selection of image tiles to be annotated, training and evaluation of DeeplabV3+, and quantification of the landscape features of selected image tiles. Then, all candidate frameworks were examined using WHU aerial building dataset as it has sufficient (i.e., 8188) labeled image tiles with representative buildings (i.e., various densities, areas, roof colors, and shapes of the building). The results support our theoretical analysis: (1) all three AL strategies reduced the number of image tiles by selecting the most informative image tiles, and no significant differences were observed in their performance; (2) image tiles with more buildings and larger building area were proven to be informative for the three AL strategies, which were prioritized during the data selection process; (3) IL can expedite model training by accumulating knowledge from chosen labeled tiles; (4) TL provides a better initial learner by incorporating knowledge from a pre-trained model; (5) DeeplabV3+ incorporated with IL, TL, and AL has the best performance in reducing the cost of data annotation. It achieved good performance (i.e., mIoU of 0.90) using only 10-15% of the sample dataset; DeeplabV3+ needs 50% of the sample dataset to realize the equivalent performance. The proposed frameworks concerning DeeplabV3+ and the results imply that integrating TL, AL, and IL in human-in-the-loop building extraction could be considered in real-world applications, especially for building footprint mapping.

引用

页数：18

共 55 条

[1] Deep Active Learning for Joint Classification & Segmentation with Weak Annotator [J].

Belharbi, Soufiane ;

Ben Ayed, Ismail ;

McCaffrey, Luke ;

Granger, Eric .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :3337-3346

[2] PyLandStats: An open-source Python']Pythonic library to compute landscape metrics [J].

Bosch, Marti .

PLOS ONE, 2019, 14 (12)

[3] Dilated-ResUnet: A novel deep learning architecture for building extraction from medium resolution multi-spectral satellite imagery [J].

Dixit, Mayank ;

Chaurasia, Kuldeep ;

Mishra, Vipul Kumar .

EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184 (184)

[4] Incorporating DeepLabv3+and object-based image analysis for semantic segmentation of very high resolution remote sensing images [J].

Du, Shouji ;

Du, Shihong ;

Liu, Bo ;

Zhang, Xiuyuan .

INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2021, 14 (03) :357-378

[5]

Etten A.V., 2018, CoRR abs/1807.01232

[6] Deep building footprint update network: A semi-supervised method for updating existing building footprint from bi-temporal remote sensing images [J].

Guo, Haonan ;

Shi, Qian ;

Marinoni, Andrea ;

Du, Bo ;

Zhang, Liangpei .

REMOTE SENSING OF ENVIRONMENT, 2021, 264

[7] From local to global: A transfer learning based approach for mapping poplar plantations at national scale using Sentinel-2 [J].

Hamrouni, Yousra ;

Paillassa, Eric ;

Cheret, Veronique ;

Monteil, Claude ;

Sheeren, David .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 171 :76-100

[8] Rethinking ImageNet Pre-training [J].

He, Kaiming ;

Girshick, Ross ;

Dollar, Piotr .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4917-4926

[9] A rasterized building footprint dataset for the United States [J].

Heris, Mehdi P. ;

Foks, Nathan Leon ;

Bagstad, Kenneth J. ;

Troy, Austin ;

Ancona, Zachary H. .

SCIENTIFIC DATA, 2020, 7 (01)

[10] Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set [J].

Ji, Shunping ;

Wei, Shiqing ;

Lu, Meng .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (01) :574-586

← 1 2 3 4 5 6 →