Predicting species abundance using machine learning approach: a comparative assessment of random forest spatial variants and performance metrics

被引:2
作者
Mushagalusa, Ciza Arsene [1 ,2 ]
Fandohan, Adande Belarmain [3 ]
Kakai, Romain Glele [1 ]
机构
[1] Univ Abomey Calavi, Fac Sci Agron, Lab Biomath & Estimat Forestieres, 04 PB 1525, Cotonou, Benin
[2] Univ Evangel Africa UEA, Fac Agr & Environm Sci, Bukavu 3323, DEM REP CONGO
[3] Univ Natl Agr, Ecole Foresterie Trop, Unite Rech Foresterie & Conservat Bioressources, BP 43, Ketou, Benin
关键词
Ecological modelling; Machine learning; Random forest; Spatial analysis; Population estimation; GEOGRAPHICALLY WEIGHTED REGRESSION; IMPERFECT DETECTION; LINEAR-MODEL; CLIMATE; CLASSIFICATION; INTERPOLATION; AUTOCORRELATION; DEPENDENCE; DIVERSITY; POWERFUL;
D O I
10.1007/s40808-024-02055-7
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
For informed decision-making in biodiversity conservation and ecological management, accurate predictions of species abundance are essential. This study aimed to assess the predictive performance of random forest (RF) spatial variants in modelling species abundance distribution compared to standard RF, Poisson, their hybrid methods with ordinary kriging (OK), and the random generalised linear model (RGLM). Model performance in abundance modelling has rarely been quantified using a comprehensive index, except the existing single statistical indices. Therefore, modified Taylor diagrams were used to evaluate the model's overall ability to predict species abundance spatial patterns, taking into account abundance class and detection probability. An exponential correlation function was used to generate spatially correlated random effects with and without a quadratic term and two variation strengths. Species abundance class and the relationship between abundance and independent variables determine which RF spatial variant performs the best. Spatial RF variants outperform conventional modelling in terms of prediction accuracy and power, particularly when spatial autocorrelation and species detection probabilities are high. RF spatial variants were less precise for common species than RGLM and GLM-OK, which better predicted species abundance for low or no spatial autocorrelation cases. However, none of the models outperformed the others for all prediction goals, highlighting the need for combining performance metrics to evaluate species abundance distribution models. The study highlights the importance of model specification in ecological research and cautions against the use of RF algorithms as a black box.
引用
收藏
页码:5145 / 5171
页数:27
相关论文
共 120 条
[1]   Probabilistic Forecasts of Mesoscale Convective System Initiation Using the Random Forest Data Mining Technique [J].
Ahijevych, David ;
Pinto, James O. ;
Williams, John K. ;
Steiner, Matthias .
WEATHER AND FORECASTING, 2016, 31 (02) :581-599
[2]   Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania [J].
Appelhans, Tim ;
Mwangomo, Ephraim ;
Hardy, Douglas R. ;
Hemp, Andreas ;
Nauss, Thomas .
SPATIAL STATISTICS, 2015, 14 :91-113
[3]   An extensive comparison of species-abundance distribution models [J].
Baldridge, Elita ;
Harris, David J. ;
Xiao, Xiao ;
White, Ethan P. .
PEERJ, 2016, 4
[4]  
Beery S., 2021, ACM SIGCAS C COMPUTI, P329, DOI DOI 10.1145/3460112.3471966
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   Assessing the impacts of imperfect detection on estimates of diversity and community structure through multispecies occupancy modeling [J].
Benoit, David ;
Jackson, Donald A. ;
Ridgway, Mark S. .
ECOLOGY AND EVOLUTION, 2018, 8 (09) :4676-4684
[7]   A random forest guided tour [J].
Biau, Gerard ;
Scornet, Erwan .
TEST, 2016, 25 (02) :197-227
[8]   A Unifying Model for Capture-Recapture and Distance Sampling Surveys of Wildlife Populations [J].
Borchers, D. L. ;
Stevenson, B. C. ;
Kidney, D. ;
Thomas, L. ;
Marques, T. A. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (509) :195-204
[9]   Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Kruppa, Jochen ;
Koenig, Inke R. .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) :493-507
[10]   Cross-taxa generalities in the relationship between population abundance and ambient temperatures [J].
Bowler, Diana E. ;
Haase, Peter ;
Hof, Christian ;
Kroencke, Ingrid ;
Baert, Leon ;
Dekoninck, Wouter ;
Domisch, Sami ;
Hendrickx, Frederik ;
Hickler, Thomas ;
Neumann, Hermann ;
O'Hara, Robert B. ;
Sell, Anne F. ;
Sonnewald, Moritz ;
Stoll, Stefan ;
Tuerkay, Michael ;
van Klink, Roel ;
Schweiger, Oliver ;
Vermeulen, Rikjan ;
Boehning-Gaese, Katrin .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2017, 284 (1863)