Random forests as a tool for ecohydrological distribution modelling

被引:292
作者
Peters, Jan
De Baets, Bernard
Verhoest, Niko E. C.
Samson, Roeland
Degroeve, Sven
De Becker, Piet
Huybrechts, Willy
机构
[1] Univ Ghent, Dept Forest & Water Management, B-9000 Ghent, Belgium
[2] Univ Ghent, Dept Appl Math Biometr & Proc Control, B-9000 Ghent, Belgium
[3] Univ Ghent, Dept Appl Ecol & Environm Biol, B-9000 Ghent, Belgium
[4] Inst Nat Conservat, Res Grp Ecohydrol & Water Syst, B-1070 Brussels, Belgium
关键词
vegetation model; random forest; classification tree; logistic regression; generalized linear model; ecohydrology;
D O I
10.1016/j.ecolmodel.2007.05.011
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
An important issue in ecohydrological research is distribution modelling, aiming at the prediction of species or vegetation type occurrence on the basis of empirical relations with hydrological or hydrogeochemical habitat conditions. In this study, two statistical techniques are evaluated: (i) the widely used multiple logistic regression technique in the generalized linear modelling framework, and (ii) a recently developed machine learning technique called 'random forests'. The latter is an ensemble learning technique that generates many classification trees and aggregates the individual results. The two different techniques are used to develop distribution models to predict the vegetation type occurrence of 11 groundwater-dependent vegetation types in Belgian lowland valley ecosystems based on spatially distributed measurements of environmental conditions. The spatially distributed data set under investigation consists of 1705 grid cells covering an area of 47.32 ha. After model construction and calibration, both models are applied to independent test data sets using two-fold cross-validation and resulting probabilities of occurrence are used to predict vegetation type distributions within the study area. Predicted vegetation types are compared with observations, and the McNemar test indicates an overall better performance of the random forest model at the 0.001 significance level. Comparison of the modelling results for each individual vegetation type separately by means of the F-measure, which combines precision and recall, also reveals better predictions by the random forest model. Inspection of the probabilities of occurrence of the different vegetation types for each grid cell demonstrates that correct predictions in central areas of homogeneous vegetation sites are based on high probabilities, whereas the confidence decreases towards the margins of these areas. Threshold-independent evaluation of the model accuracy by means of the area under the receiver operating characteristic (ROC) curves confirms good performances of both models, but with higher values for the random forest model. Therefore, the incorporation of the random forest technique in distribution models has the ability to lead to better model performances. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:304 / 318
页数:15
相关论文
共 67 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   Exploring spatial vegetation dynamics using logistic regression and a multinomial logit model [J].
Augustin, NH ;
Cummins, RP ;
French, DD .
JOURNAL OF APPLIED ECOLOGY, 2001, 38 (05) :991-1006
[3]   Spatial prediction of species distribution: an interface between ecological theory and statistical modelling [J].
Austin, MP .
ECOLOGICAL MODELLING, 2002, 157 (2-3) :101-118
[4]   Determining alternative models for vegetation response analysis: a non-parametric approach [J].
Bio, AMF ;
Alkemade, R ;
Barendregt, A .
JOURNAL OF VEGETATION SCIENCE, 1998, 9 (01) :5-16
[5]   Prediction of plant species distribution in lowland river valleys in Belgium: modelling species response to site conditions [J].
Bio, AMF ;
De Becker, P ;
De Bie, E ;
Huybrechts, W ;
Wassen, M .
BIODIVERSITY AND CONSERVATION, 2002, 11 (12) :2189-2216
[6]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[7]  
BOX EO, 1992, MACROCLIMATE PLANT F
[8]   Evaluating resource selection functions [J].
Boyce, MS ;
Vernier, PR ;
Nielsen, SE ;
Schmiegelow, FKA .
ECOLOGICAL MODELLING, 2002, 157 (2-3) :281-300
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   Prediction of protein-protein interactions using random decision forest framework [J].
Chen, XW ;
Liu, M .
BIOINFORMATICS, 2005, 21 (24) :4394-4400