Random forests and stochastic gradient boosting for predicting tree canopy cover: comparing tuning processes and model performance

被引:159
作者
Freeman, Elizabeth A. [1 ]
Moisen, Gretchen G. [1 ]
Coulston, John W. [2 ]
Wilson, Barry T. [3 ]
机构
[1] USDA Forest Serv, Rocky Mt Res Stn, 507 25th St, Ogden, UT 84401 USA
[2] USDA Forest Serv, Southern Res Stn, 1710 Res Ctr Dr, Blacksburg, VA 24060 USA
[3] USDA Forest Serv, No Res Stn, 1992 Folwell Ave, St Paul, MN 55108 USA
关键词
tree canopy cover; predictive mapping; classification and regression trees; random forest; stochastic gradient boosting; ABOVEGROUND BIOMASS; REGRESSION TREES; CLASSIFICATION; IMAGERY; DATABASE; FISH;
D O I
10.1139/cjfr-2014-0562
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
As part of the development of the 2011 National Land Cover Database (NLCD) tree canopy cover layer, a pilot project was launched to test the use of high-resolution photography coupled with extensive ancillary data to map the distribution of tree canopy cover over four study regions in the conterminous US. Two stochastic modeling techniques, random forests (RF) and stochastic gradient boosting (SGB), are compared. The objectives of this study were first to explore the sensitivity of RF and SGB to choices in tuning parameters and, second, to compare the performance of the two final models by assessing the importance of, and interaction between, predictor variables, the global accuracy metrics derived from an independent test set, as well as the visual quality of the resultant maps of tree canopy cover. The predictive accuracy of RF and SGB was remarkably similar on all four of our pilot regions. In all four study regions, the independent test set mean squared error (MSE) was identical to three decimal places, with the largest difference in Kansas where RF gave an MSE of 0.0113 and SGB gave an MSE of 0.0117. With correlated predictor variables, SGB had a tendency to concentrate variable importance in fewer variables, whereas RF tended to spread importance among more variables. RF is simpler to implement than SGB, as RF has fewer parameters needing tuning and also was less sensitive to these parameters. As stochastic techniques, both RF and SGB introduce a new component of uncertainty: repeated model runs will potentially result in different final predictions. We demonstrate how RF allows the production of a spatially explicit map of this stochastic uncertainty of the final model.
引用
收藏
页码:323 / 339
页数:17
相关论文
共 56 条
[1]  
[Anonymous], 2006, LANDFIRE PROTOTYPE P, DOI DOI 10.2737/RMRS-GTR-175
[2]  
[Anonymous], Res. Lett.
[3]  
[Anonymous], 2001, J. Clin. Microbiol
[4]  
[Anonymous], 1995, USDA FOREST SERVICE
[5]  
Baker C, 2006, WETLANDS, V26, P465, DOI 10.1672/0277-5212(2006)26[465:MWARAU]2.0.CO
[6]  
2
[7]  
Bechtold W.A., 2005, General Technical Report SRS-80
[8]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery [J].
Chan, Jonathan Cheung-Wai ;
Paelinckx, Desire .
REMOTE SENSING OF ENVIRONMENT, 2008, 112 (06) :2999-3011