Random forests for classification in ecology

被引:3418
作者
Cutler, D. Richard [1 ]
Edwards, Thomas C., Jr.
Beard, Karen H.
Cutler, Adele
Hess, Kyle T.
机构
[1] Utah State Univ, Dept Math & Stat, Logan, UT 84322 USA
[2] Utah State Univ, Utah Cooperat Fish & Wildlife Res Unit, US Geol Survey, Logan, UT 84322 USA
[3] Utah State Univ, Dept Wildland Resources & Ecol Ctr, Logan, UT 84322 USA
[4] Utah State Univ, Dept Math & Stat, Logan, UT 84322 USA
[5] Utah State Univ, Dept Wildland Resources, Logan, UT 84322 USA
[6] Univ Washington, Coll Forest Resources, Seattle, WA 98195 USA
关键词
additive logistic regression; classification trees; LDA; logistic regression; machine learning; partial dependence plots; random forests; species distribution models;
D O I
10.1890/07-0539.1
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.
引用
收藏
页码:2783 / 2792
页数:10
相关论文
共 14 条
  • [1] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Random forests for microarrays
    Cutler, Adele
    Stevens, John R.
    [J]. DNA MICROARRAYS, PART B: DATABASES AND STATISTICS, 2006, 411 : 422 - +
  • [4] De'ath G, 2000, ECOLOGY, V81, P3178, DOI 10.2307/177409
  • [5] Model-based stratifications for enhancing the detection of rare ecological events
    Edwards, TC
    Cutler, DR
    Zimmermann, NE
    Geiser, L
    Alegria, J
    [J]. ECOLOGY, 2005, 86 (05) : 1081 - 1090
  • [6] Effects of sample survey design on the accuracy of classification tree models in species distribution models
    Edwards, Thomas C., Jr.
    Cutler, D. Richard
    Zimmermann, Niklaus E.
    Geiser, Linda
    Moisen, Gretchen G.
    [J]. ECOLOGICAL MODELLING, 2006, 199 (02) : 132 - 141
  • [7] Roads as conduits for exotic plant invasions in a semiarid landscape
    Gelbard, JL
    Belnap, J
    [J]. CONSERVATION BIOLOGY, 2003, 17 (02) : 420 - 432
  • [8] Predicting species distribution: offering more than simple habitat models
    Guisan, A
    Thuiller, W
    [J]. ECOLOGY LETTERS, 2005, 8 (09) : 993 - 1009
  • [9] Hastie T., 2009, The Elements of Statistical Learning, P9
  • [10] Landscape patterns as habitat predictors: building and testing models for cavity-nesting birds in the Uinta Mountains of Utah, USA
    Lawler, JJ
    Edwards, TC
    [J]. LANDSCAPE ECOLOGY, 2002, 17 (03) : 233 - 245