Predicting insect outbreaks using machine learning: A mountain pine beetle case study

被引:24
作者
Ramazi, Pouria [1 ,2 ]
Kunegel-Lion, Melodie [3 ]
Greiner, Russell [2 ,4 ]
Lewis, Mark A. [1 ,3 ]
机构
[1] Univ Alberta, Dept Math & Stat Sci, Edmonton, AB, Canada
[2] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
[3] Univ Alberta, Dept Biol Sci, Edmonton, AB, Canada
[4] Alberta Machine Intelligence Inst, Edmonton, AB, Canada
来源
ECOLOGY AND EVOLUTION | 2021年 / 11卷 / 19期
基金
加拿大自然科学与工程研究理事会;
关键词
future infestations; insect spread; machine learning; mountain pine beetle; predictive ecology; temporal prediction; BAYESIAN NETWORKS; CLIMATE-CHANGE; AUTOCORRELATION; POPULATIONS; PERFORMANCE; RELEVANCE; SELECTION; MOVEMENT; PATTERNS; ECOLOGY;
D O I
10.1002/ece3.7921
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Planning forest management relies on predicting insect outbreaks such as mountain pine beetle, particularly in the intermediate-term future, e.g., 5-year. Machine-learning algorithms are potential solutions to this challenging problem due to their many successes across a variety of prediction tasks. However, there are many subtle challenges in applying them: identifying the best learning models and the best subset of available covariates (including time lags) and properly evaluating the models to avoid misleading performance-measures. We systematically address these issues in predicting the chance of a mountain pine beetle outbreak in the Cypress Hills area and seek models with the best performance at predicting future 1-, 3-, 5- and 7-year infestations. We train nine machine-learning models, including two generalized boosted regression trees (GBM) that predict future 1- and 3-year infestations with 92% and 88% AUC, and two novel mixed models that predict future 5- and 7-year infestations with 86% and 84% AUC, respectively. We also consider forming the train and test datasets by splitting the original dataset randomly rather than using the appropriate year-based approach and show that this may obtain models that score high on the test dataset but low in practice, resulting in inaccurate performance evaluations. For example, a k-nearest neighbor model with the actual performance of 68% AUC, scores the misleadingly high 78% on a test dataset obtained from a random split, but the more accurate 66% on a year-based split. We then investigate how the prediction accuracy varies with respect to the provided history length of the covariates and find that neural network and naive Bayes, predict more accurately as history-length increases, particularly for future 1- and 3-year predictions, and roughly the same holds with GBM. Our approach is applicable to other invasive species. The resulting predictors can be used in planning forest and pest management and planning sampling locations in field studies.
引用
收藏
页码:13014 / 13028
页数:15
相关论文
共 59 条
  • [1] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION
    ALTMAN, NS
    [J]. AMERICAN STATISTICIAN, 1992, 46 (03) : 175 - 185
  • [2] Mapping Bugweed (Solanum mauritianum) Infestations in Pinus patula Plantations Using Hyperspectral Imagery and Support Vector Machines
    Atkinson, Jonathan Tom
    Ismail, Riyad
    Robertson, Mark
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2014, 7 (01) : 17 - 28
  • [3] Movement of outbreak populations of mountain pine beetle: influences of spatiotemporal patterns and climate
    Aukema, Brian H.
    Carroll, Allan L.
    Zheng, Yanbing
    Zhu, Jun
    Raffa, Kenneth F.
    Moore, R. Dan
    Stahl, Kerstin
    Taylor, Stephen W.
    [J]. ECOGRAPHY, 2008, 31 (03) : 348 - 358
  • [4] Testing the predictive performance of distribution models
    Bahn, Volker
    McGill, Brian J.
    [J]. OIKOS, 2013, 122 (03) : 321 - 331
  • [5] Temporal autocorrelation functions for movement rates from global positioning system radiotelemetry data
    Boyce, Mark S.
    Pitt, Justin
    Northrup, Joseph M.
    Morehouse, Andrea T.
    Knopff, Kyle H.
    Cristescu, Bogdan
    Stenhouse, Gordon B.
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2010, 365 (1550) : 2213 - 2219
  • [6] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [7] Using Bayesian networks with rule extraction to infer the risk of weed infestation in a corn-crop
    Bressan, Glaucia M.
    Oliveira, Vilma A.
    Hruschka, Estevam R., Jr.
    Nicoletti, Maria C.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2009, 22 (4-5) : 579 - 592
  • [8] Predicting current and future biological invasions: both native and invaded ranges matter
    Broennimann, Olivier
    Guisan, Antoine
    [J]. BIOLOGY LETTERS, 2008, 4 (05) : 585 - 589
  • [9] Carroll AL, 2004, PACIF FOR C, V399, P21
  • [10] Climatic conditions for emergence and flight of mountain pine beetle: implications for long-distance dispersal
    Chen, Huapeng
    Jackson, Peter L.
    [J]. CANADIAN JOURNAL OF FOREST RESEARCH, 2017, 47 (07) : 974 - 984