Forward selection of explanatory variables

被引:1778
作者
Blanchet, F. Guillaume [1 ]
Legendre, Pierre [1 ]
Borcard, Daniel [1 ]
机构
[1] Univ Montreal, Dept Sci Biol, Montreal, PQ H3C 3J7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
forward selection; Moran's eigenvector maps (MEM); non-orthogonal explanatory variables; orthogonal explanatory variables; principal coordinates of neighbor matrices (PCNM); simulation study; Type I error;
D O I
10.1890/07-0986.1
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
This paper proposes a new way of using forward selection of explanatory variables in regression or canonical redundancy analysis. The classical forward selection method presents two problems: a highly inflated Type I error and an overestimation of the amount of explained variance. Correcting these problems will greatly improve the performance of this very useful method in ecological modeling. To prevent the first problem, we propose a two-step procedure. First, a global test using all explanatory variables is carried out. If, and only if, the global test is significant, one can proceed with forward selection. To prevent overestimation of the explained variance, the forward selection has to be carried out with two stopping criteria: (1) the usual alpha significance level and (2) the adjusted coefficient of multiple determination (R-a(2)) calculated using all explanatory variables. When forward selection identifies a variable that brings one or the other criterion over the fixed threshold, that variable is rejected, and the procedure is stopped. This improved method is validated by simulations involving univariate and multivariate response data. An ecological example is presented using data from the Bryce Canyon National Park, Utah, USA.
引用
收藏
页码:2623 / 2632
页数:10
相关论文
共 39 条
[1]   An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model [J].
Anderson, MJ ;
Legendre, P .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1999, 62 (03) :271-303
[2]  
Bellier E, 2007, ECOGRAPHY, V30, P385, DOI 10.1111/j.0906-7590.2007.04911.x
[3]  
Bonferroni C. E., 1935, Studi in Onore del Professore Salvatore Ortu Carboni, P13
[4]   All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices [J].
Borcard, D ;
Legendre, P .
ECOLOGICAL MODELLING, 2002, 153 (1-2) :51-68
[5]   Dissecting the spatial structure of ecological data at multiple scales [J].
Borcard, D ;
Legendre, P ;
Avois-Jacquet, C ;
Tuomisto, H .
ECOLOGY, 2004, 85 (07) :1826-1832
[6]   Multiscale spatial distribution of a littoral fish community in relation to environmental variables [J].
Brind'Amour, A ;
Boisclair, D ;
Legendre, P ;
Borcard, D .
LIMNOLOGY AND OCEANOGRAPHY, 2005, 50 (02) :465-479
[7]  
Chatterjee S., 1977, Regression analysis by example.
[8]  
Cohen J., 2013, APPL MULTIPLE REGRES, DOI [DOI 10.1002/0471264385.WEI0219, 10.4324/ 9780203774441, DOI 10.4324/9780203774441, 10.1002/0471264385.wei0219]
[9]  
COPAS JB, 1991, STATISTICIAN, V40, P51
[10]   BACKWARD, FORWARD AND STEPWISE AUTOMATED SUBSET-SELECTION ALGORITHMS - FREQUENCY OF OBTAINING AUTHENTIC AND NOISE VARIABLES [J].
DERKSEN, S ;
KESELMAN, HJ .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1992, 45 :265-282