The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity

被引:230
作者
Brewer, Mark J. [1 ]
Butler, Adam [2 ]
Cooksley, Susan L. [3 ]
机构
[1] Biomath & Stat Scotland, Aberdeen AB15 8QH, Scotland
[2] JCMB, Biomath & Stat Scotland, Kings Bldg, Edinburgh EH9 3JZ, Midlothian, Scotland
[3] James Hutton Inst, Aberdeen AB15 8QH, Scotland
来源
METHODS IN ECOLOGY AND EVOLUTION | 2016年 / 7卷 / 06期
关键词
Akaike Information Criterion; Bayesian Information Criterion; generalized linear models; likelihood penalization; linear regression; model selection; statistical controversies; MODEL SELECTION;
D O I
10.1111/2041-210X.12541
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Model selection is difficult. Even in the apparently straightforward case of choosing between standard linear regression models, there does not yet appear to be consensus in the statistical ecology literature as to the right approach. We review recent works on model selection in ecology and subsequently focus on one aspect in particular: the use of the Akaike Information Criterion (AIC) or its small-sample equivalent, AICC. We create a novel framework for simulation studies and use this to study model selection from simulated data sets with a range of properties, which differ in terms of degree of unobserved heterogeneity. We use the results of the simulation study to suggest an approach for model selection based on ideas from information criteria but requiring simulation. We find that the relative predictive performance of model selection by different information criteria is heavily dependent on the degree of unobserved heterogeneity between data sets. When heterogeneity is small, AIC or AICC are likely to perform well, but if heterogeneity is large, the Bayesian Information Criterion (BIC) will often perform better, due to the stronger penalty afforded. Our conclusion is that the choice of information criterion (or more broadly, the strength of likelihood penalty) should ideally be based upon hypothesized (or estimated from previous data) properties of the population of data sets from which a given data set could have arisen. Relying on a single form of information criterion is unlikely to be universally successful.
引用
收藏
页码:679 / 692
页数:14
相关论文
共 29 条
  • [1] Model selection for ecologists: the worldviews of AIC and BIC
    Aho, Ken
    Derryberry, DeWayne
    Peterson, Teri
    [J]. ECOLOGY, 2014, 95 (03) : 631 - 636
  • [2] Akaike H., 1992, 2 INT S INF THEOR, P610, DOI [10.1007/978-1-4612-1694-0, 10.1007/978-1-4612-0919-538, 10.1007/978-1-4612-0919-5_38, 10.1007/978-0-387-98135-2, DOI 10.1007/978-1-4612-0919-538]
  • [3] [Anonymous], 1966, APPL REGRESSION ANAL
  • [4] [Anonymous], 2002, Model selection and multimodel inference: a practical informationtheoretic approach
  • [5] [Anonymous], SOCIOL METHOD RES
  • [6] [Anonymous], TECHNICAL REPORT
  • [7] Barton K., 2015, MuMIn: Multi-model inference
  • [8] Model selection: An integral part of inference
    Buckland, ST
    Burnham, KP
    Augustin, NH
    [J]. BIOMETRICS, 1997, 53 (02) : 603 - 618
  • [9] Model averaging and muddled multimodel inferences
    Cade, Brian S.
    [J]. ECOLOGY, 2015, 96 (09) : 2370 - 2382
  • [10] Impacts of artificial structures on the freshwater pearl mussel Margaritifera margaritifera in the River Dee, Scotland
    Cooksley, Susan L.
    Brewer, Mark J.
    Donnelly, David
    Spezia, Luigi
    Tree, Angus
    [J]. AQUATIC CONSERVATION-MARINE AND FRESHWATER ECOSYSTEMS, 2012, 22 (03) : 318 - 330