Too many candidates: Embedded covariate selection procedure for species distribution modelling with the covsel R package

被引:23
作者
Adde, Antoine [1 ]
Rey, Pierre-Louis [1 ]
Fopp, Fabian [2 ,3 ]
Petitpierre, Blaise [1 ,4 ]
Schweiger, Anna K. [5 ]
Broennimann, Olivier [1 ,6 ]
Lehmann, Anthony [7 ]
Zimmermann, Niklaus E. [2 ]
Altermatt, Florian [8 ,9 ]
Pellissier, Loic [2 ,3 ]
Guisan, Antoine [1 ,6 ]
机构
[1] Univ Lausanne, Inst Earth Surface Dynam, Fac Geosci & Environm, CH-1015 Lausanne, Switzerland
[2] Swiss Fed Inst Forest Snow & Landscape Res WSL, Land Change Sci Res Unit, Birmensdorf, Switzerland
[3] Swiss Fed Inst Technol, Dept Environm Syst Sci, Inst Terr Ecosyst, Ecosyst Landscape Evolut, Zurich, Switzerland
[4] Conservatoire & Jardin Bot Gen, InfoFlora, Chambesy, Switzerland
[5] Univ Zurich, Dept Geog Remote Sensing Labs, Zurich, Switzerland
[6] Univ Lausanne, Dept Ecol & Evolut, Lausanne, Switzerland
[7] Univ Geneva, Inst Environm Sci, EnviroSPACE, Geneva, Switzerland
[8] Univ Zurich, Dept Evolutionary Biol & Environm Studies, Zurich, Switzerland
[9] Eawag Swiss Fed Inst Aquat Sci & Technol, Dept Aquat Ecol, Dubendorf, Switzerland
关键词
Automated covariate selection; Generalized additive model with null-space; penalization; Generalized linear model with elastic-net reg-ularization; Guided regularized random forest; Multicollinearity; Predictors; Species distribution models; R package; ENVIRONMENTAL PREDICTORS; VARIABLE SELECTION; REGRESSION; GUIDE;
D O I
10.1016/j.ecoinf.2023.102080
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
1. Selecting the best subset of covariates out of a panel of many candidates is a key and highly influential stage of the species distribution modelling process. Yet, there is currently no commonly accepted and widely adopted standard approach by which to perform this selection. 2. We introduce a two-step "embedded" covariate selection procedure aimed at optimizing the pre-dictive ability and parsimony of species distribution models fitted in a context of high-dimensional candidate covariate space. The procedure combines a collinearity-filtering algorithm (Step A) with three model-specific embedded regularization techniques (Step B), including generalized linear model with elastic net regularization, generalized additive model with null-space penalization, and guided regularized random forest. 3. We evaluated the embedded covariate selection procedure through an example application aimed at modelling the habitat suitability of 50 species in Switzerland from a suite of 123 candidate covariates. We demonstrated the ability of the embedded covariate selection procedure to provide significantly more accurate species distribution models as compared to models obtained with alternative procedures. Model performance was independent of the characteristics of the species data, such as the number of occurrence records or their spatial distribution across the study area. 4. We implemented and streamlined our embedded covariate selection procedure in the covsel R package, paving the way for a ready-to-use, automated, covariate selection tool that was missing in the field of species distribution modelling. All the information required for installing and running the covsel R package is openly available on the GitHub repository https://github.com/N-S DM/covsel.
引用
收藏
页数:8
相关论文
共 50 条
[1]   Spatial Gaps in Global Biodiversity Information and the Role of Citizen Science [J].
Amano, Tatsuya ;
Lamming, James D. L. ;
Sutherland, William J. .
BIOSCIENCE, 2016, 66 (05) :393-400
[2]   Standards for distribution models in biodiversity assessments [J].
Araujo, Miguel B. ;
Anderson, Robert P. ;
Marcia Barbosa, A. ;
Beale, Colin M. ;
Dormann, Carsten F. ;
Early, Regan ;
Garcia, Raquel A. ;
Guisan, Antoine ;
Maiorano, Luigi ;
Naimi, Babak ;
O'Hara, Robert B. ;
Zimmermann, Niklaus E. ;
Rahbek, Carsten .
SCIENCE ADVANCES, 2019, 5 (01)
[3]   Improving species distribution models for climate change studies: variable selection and scale [J].
Austin, Mike P. ;
Van Niel, Kimberly P. .
JOURNAL OF BIOGEOGRAPHY, 2011, 38 (01) :1-8
[4]   Comparison of climate envelope models developed using expert-selected variables versus statistical selection [J].
Brandt, Laura A. ;
Benscoter, Allison M. ;
Harvey, Rebecca ;
Speroterra, Carolina ;
Bucklin, David ;
Romanach, Stephanie S. ;
Watling, James I. ;
Mazzotti, Frank J. .
ECOLOGICAL MODELLING, 2017, 345 :10-20
[5]   Role of range and precision of the independent variable in regression of data [J].
Brauner, N ;
Shacham, M .
AICHE JOURNAL, 1998, 44 (03) :603-611
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Model complexity affects species distribution projections under climate change [J].
Brun, Philipp ;
Thuiller, Wilfried ;
Chauvier, Yohann ;
Pellissier, Loic ;
Wueest, Rafael O. ;
Wang, Zhiheng ;
Zimmermann, Niklaus E. .
JOURNAL OF BIOGEOGRAPHY, 2020, 47 (01) :130-142
[8]   Evaluating collinearity effects on species distribution models: An approach based on virtual species simulation [J].
de Marco Junior, Paulo ;
Nobrega, Caroline Correa .
PLOS ONE, 2018, 13 (09)
[9]   Gene selection with guided regularized random forest [J].
Deng, Houtao ;
Runger, George .
PATTERN RECOGNITION, 2013, 46 (12) :3483-3489
[10]   Citizen Science as an Ecological Research Tool: Challenges and Benefits [J].
Dickinson, Janis L. ;
Zuckerberg, Benjamin ;
Bonter, David N. .
ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS, VOL 41, 2010, 41 :149-172