Optimizing selection of training and auxiliary data for operational land cover classification for the LCMAP initiative

被引:141
作者
Zhu, Zhe [1 ,5 ]
Gallant, Alisa L. [2 ]
Woodcock, Curtis E. [3 ]
Pengra, Bruce [4 ]
Olofsson, Pontus [3 ]
Loveland, Thomas R. [2 ]
Jin, Suming [5 ]
Dahal, Devendra [4 ]
Yang, Limin [4 ]
Auch, Roger F. [2 ]
机构
[1] Texas Tech Univ, Dept Geosci, MS 1053,Sci Bldg 125, Lubbock, TX 79409 USA
[2] US Geol Survey, Earth Resources Observat & Sci EROS Ctr, 47914 252nd St, Sioux Falls, SD 57198 USA
[3] Boston Univ, Dept Earth & Environm, 685 Commonwealth Ave, Boston, MA 02215 USA
[4] US Geol Survey, SGT, Earth Resources Observat & Sci EROS Ctr, 47914 252nd St, Sioux Falls, SD 57198 USA
[5] US Geol Survey, ASRC InuTeq, Earth Resources Observat & Sci EROS Ctr, 47914 252nd St, Sioux Falls, SD 57198 USA
关键词
Continuous Change Detection and Classification (CCDC); Training strategy; Auxiliary data; Land cover classification; Landsat; RANDOM FOREST CLASSIFICATION; ANCILLARY DATA; IMAGE CLASSIFICATION; CLOUD SHADOW; SAMPLE SELECTION; NEURAL-NETWORK; SNOW DETECTION; UNITED-STATES; ACCURACY; TM;
D O I
10.1016/j.isprsjprs.2016.11.004
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
The U.S. Geological Survey's Land Change Monitoring, Assessment, and Projection (LCMAP) initiative is a new end-to-end capability to continuously track and characterize changes in land cover, use, and condition to better support research and applications relevant to resource management and environmental change. Among the LCMAP product suite are annual land cover maps that will be available to the public. This paper describes an approach to optimize the selection of training and auxiliary data for deriving the thematic land cover maps based on all available clear observations from Landsats 4-8. Training data were selected from map products of the U.S. Geological Survey's Land Cover Trends project. The Random Forest classifier was applied for different classification scenarios based on the Continuous Change Detection and Classification (CCDC) algorithm. We found that extracting training data proportionally to the occurrence of land cover classes was superior to an equal distribution of training data per class, and suggest using a total of 20,000 training pixels to classify an area about the size of a Landsat scene. The problem of unbalanced training data was alleviated by extracting a minimum of 600 training pixels and a maximum of 8000 training pixels per class. We additionally explored removing outliers contained within the training data based on their spectral and spatial criteria, but observed no significant improvement in classification results. We also tested the importance of different types of auxiliary data that were available for the conterminous United States, including: (a) five variables used by the National Land Cover Database, (b) three variables from the cloud screening "Function of mask" (Fmask) statistics, and (c) two variables from the change detection results of CCDC. We found that auxiliary variables such as a Digital Elevation Model and its derivatives (aspect, position index, and slope), potential wetland index, water probability, snow probability, and cloud probability improved the accuracy of land cover classification. Compared to the original strategy of the CCDC algorithm (500 pixels per class), the use of the optimal strategy improved the classification accuracies substantially (15-percentage point increase in overall accuracy and 4-percentage point increase in minimum accuracy). (C) 2016 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:206 / 221
页数:16
相关论文
共 84 条
[1]  
Anderson J., 1976, Geological survey professional paper 964, V964
[2]   Empirical characterization of random forest variable importance measures [J].
Archer, Kelfie J. ;
Kirnes, Ryan V. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (04) :2249-2260
[3]  
Auch R.F., 2012, REMOTE SENSING LAND, P351
[4]  
Auch R.F., 2015, US GEOLOGICAL SURV C, P190
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery [J].
Chan, Jonathan Cheung-Wai ;
Paelinckx, Desire .
REMOTE SENSING OF ENVIRONMENT, 2008, 112 (06) :2999-3011
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]   Global land cover mapping at 30 m resolution: A POK-based operational approach [J].
Chen, Jun ;
Chen, Jin ;
Liao, Anping ;
Cao, Xin ;
Chen, Lijun ;
Chen, Xuehong ;
He, Chaoying ;
Han, Gang ;
Peng, Shu ;
Lu, Miao ;
Zhang, Weiwei ;
Tong, Xiaohua ;
Mills, Jon .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2015, 103 :7-27
[9]  
Cochran W. G., 1977, Sampling Techniques, V3rd ed.
[10]   Influence of Multi-Source and Multi-Temporal Remotely Sensed and Ancillary Data on the Accuracy of Random Forest Classification of Wetlands in Northern Minnesota [J].
Corcoran, Jennifer M. ;
Knight, Joseph F. ;
Gallant, Alisa L. .
REMOTE SENSING, 2013, 5 (07) :3212-3238