Modelling post-fire tree mortality: Can random forest improve discrimination of imbalanced data?

被引:33
作者
Shearman, Timothy M. [1 ]
Varner, J. Morgan [2 ]
Hood, Sharon M. [3 ]
Cansler, C. Alina [1 ,3 ]
Hiers, J. Kevin [2 ]
机构
[1] Univ Washington, Sch Environm & Forest Sci, Seattle, WA 98195 USA
[2] Tall Timbers Res Stn, Tallahassee, FL 32312 USA
[3] US Forest Serv, USDA, Rocky Mt Res Stn, Missoula, MT 59808 USA
关键词
Fire effects; Logistic regression; Machine learning; Model evaluation; Model validation; Pinus palustris; Prescribed fire; PONDEROSA PINE; PRESCRIBED FIRES; CLASSIFICATION; CONIFERS; WILDFIRE; EVALUATE; OREGON;
D O I
10.1016/j.ecolmodel.2019.108855
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Predicting post-fire tree mortality is a major area of research in fire-prone forests, woodlands, and savannas worldwide. Past research has relied overwhelmingly on logistic regression analysis (LR) that predicts post-fire tree status as a binary outcome (i.e. living or dead). One of the most problematic issues for LR (or any classification problem) occurs when there is a class imbalance in the training data. In these instances, predictions will be biased toward the majority class. Using a historical prescribed fire data set of longleaf pines (Pines palustris) from northern Florida, USA, we compare results from standard LR and the machine-learning algorithm, random forest (RF). First, we demonstrate the class imbalance problem using simulated data. We then show how a balanced RF model can be used to alleviate the bias in the model and improve mortality prediction results. In the simulated example, LR model sensitivity and specificity was clearly biased based on the degree of imbalance between the classes. The balanced RF models had consistent sensitivity and specificity throughout the simulated data sets. Re-analyzing the original longleaf pine data set with a balanced RF model showed that although both LR and RF models had similar areas under the receiver operating curve (AUC), the RF model had better discrimination for predicting new observations of dead trees. Both LR and RF models identified duff consumption and percent crown scorch as important predictors of tree mortality, however the RF model also suggested prefire duff depth as an important predictor. Our analysis highlights LR limitations when data are imbalanced and supports using RF to develop post-fire tree mortality models. We suggest how RF can be incorporated into future tree mortality studies, as well as possible implementation in future decision-support tools.
引用
收藏
页数:8
相关论文
共 54 条
  • [1] Basic principles of forest fuel reduction treatments
    Agee, JK
    Skinner, CN
    [J]. FOREST ECOLOGY AND MANAGEMENT, 2005, 211 (1-2) : 83 - 96
  • [2] The use of "overall accuracy" to evaluate the validity of screening or diagnostic tests
    Alberg, AJ
    Park, JW
    Hager, BW
    Brock, MV
    Diener-West, M
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2004, 19 (05) : 460 - 465
  • [3] CALCULATING AND INTERPRETING FOREST FIRE INTENSITIES
    ALEXANDER, ME
    [J]. CANADIAN JOURNAL OF BOTANY-REVUE CANADIENNE DE BOTANIQUE, 1982, 60 (04): : 349 - 357
  • [4] Barnard D.M., 2019, RESTOR ECOL, DOI [10.1111/rec, DOI 10.1111/REC]
  • [5] Bevins CD., 1980, INTRN287 USDA FOR SE
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Statistical modeling: The two cultures
    Breiman, L
    [J]. STATISTICAL SCIENCE, 2001, 16 (03) : 199 - 215
  • [8] Cansler CA., 2019, P FIR CONT C
  • [9] Chen C., 2004, Using Random Forest to Learn Imbalanced Data
  • [10] Random forests for classification in ecology
    Cutler, D. Richard
    Edwards, Thomas C., Jr.
    Beard, Karen H.
    Cutler, Adele
    Hess, Kyle T.
    [J]. ECOLOGY, 2007, 88 (11) : 2783 - 2792