Identifying appropriate spatial scales of predictors in species distribution models with the random forest algorithm

被引:95
作者
Bradter, Ute [1 ]
Kunin, William E. [1 ]
Altringham, John D. [1 ]
Thom, Tim J. [2 ]
Benton, Tim G. [1 ]
机构
[1] Univ Leeds, Sch Biol, Leeds LS2 9JT, W Yorkshire, England
[2] Yorkshire Dales Natl Pk Author, Grassington BD23 5LB, England
来源
METHODS IN ECOLOGY AND EVOLUTION | 2013年 / 4卷 / 02期
关键词
curlew; landscape scale; machine learning; multiple spatial scales; patch; scale selection; spatial autocorrelation; variable selection; Yorkshire Dales; wader; VARIABLE IMPORTANCE MEASURES; HABITAT; AUTOCORRELATION; CLASSIFICATION; LANDSCAPE; SELECTION; CONNECTIVITY; PREFERENCES; DENSITY;
D O I
10.1111/j.2041-210x.2012.00253.x
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Including predictors in species distribution models at inappropriate spatial scales can decrease the variance explained, add residual spatial autocorrelation (RSA) and lead to the wrong conclusions. Some studies have measured predictors within different buffer sizes (scales) around sample locations, regressed each predictor against the response at each scale and selected the scale with the best model fit as the appropriate scale for this predictor. However, a predictor can influence a species at several scales or show several scales with good model fit due to a bias caused by RSA. This makes the evaluation of all scales with good model fit necessary. With potentially several scales per predictor and multiple predictors to evaluate, the number of predictors can be large relative to the number of data points, potentially impeding variable selection with traditional statistical techniques, such as logistic regression. We trialled a variable selection process using the random forest algorithm, which allows the simultaneous evaluation of several scales of multiple predictors. Using simulated responses, we compared the performance of models resulting from this approach with models using the known predictors at arbitrary and at the known spatial scales. We also apply the proposed approach to a real data set of curlew (Numenius arquata). AIC, AUC and Naglekerke's pseudo R2 of the models resulting from the proposed variable selection were often very similar to the models with the known predictors at known spatial scales. Only two of nine models required the addition of spatial eigenvectors to account for RSA. Arbitrary scale models always required the addition of spatial eigenvectors. 75% (50100%) of the known predictors were selected at scales similar to the known scale (within 3km). In the curlew model, predictors at large, medium and small spatial scales were selected, suggesting that for appropriate landscape-scale models multiple scales need to be evaluated. The proposed approach selected several of the correct predictors at appropriate spatial scales out of 544 possible predictors. Thus, it facilitates the evaluation of multiple spatial scales of multiple predictors against each other in landscape-scale models.
引用
收藏
页码:167 / 174
页数:8
相关论文
共 55 条
  • [1] [Anonymous], 2006, C&H TEXT STAT SCI, DOI 10.1201/9781315382722
  • [2] Empirical characterization of random forest variable importance measures
    Archer, Kelfie J.
    Kirnes, Ryan V.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (04) : 2249 - 2260
  • [3] Bivand RS, 2008, USE R, P1
  • [4] Prediction of National Vegetation Classification communities in the British uplands using environmental data at multiple spatial scales, aerial images and the classifier random forest
    Bradter, Ute
    Thom, Tim J.
    Altringham, John D.
    Kunin, William E.
    Benton, Tim G.
    [J]. JOURNAL OF APPLIED ECOLOGY, 2011, 48 (04) : 1057 - 1065
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Multimodel inference - understanding AIC and BIC in model selection
    Burnham, KP
    Anderson, DR
    [J]. SOCIOLOGICAL METHODS & RESEARCH, 2004, 33 (02) : 261 - 304
  • [7] Random Forest characterization of upland vegetation and management burning from aerial imagery
    Chapman, Daniel S.
    Bonn, Aletta
    Kunin, William E.
    Cornell, Stephen J.
    [J]. JOURNAL OF BIOGEOGRAPHY, 2010, 37 (01) : 37 - 46
  • [8] Cunningham MA, 2006, ECOL APPL, V16, P1062, DOI 10.1890/1051-0761(2006)016[1062:PALFIG]2.0.CO
  • [9] 2
  • [10] Random forests for classification in ecology
    Cutler, D. Richard
    Edwards, Thomas C., Jr.
    Beard, Karen H.
    Cutler, Adele
    Hess, Kyle T.
    [J]. ECOLOGY, 2007, 88 (11) : 2783 - 2792