Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse

被引:863
作者
Forstmeier, Wolfgang [1 ]
Schielzeth, Holger [1 ]
机构
[1] Max Planck Inst Ornithol, D-82319 Seewiesen, Germany
关键词
Bonferroni correction; Effect size estimation; Generalised linear models; Model selection; Multiple regression; Multiple testing; Parameter estimation; Publication bias; FALSE DISCOVERY RATE; BEHAVIORAL ECOLOGY; STATISTICAL POWER; SELECTION; BONFERRONI; INFERENCE; PUBLICATION; NOISE;
D O I
10.1007/s00265-010-1038-5
中图分类号
B84 [心理学]; C [社会科学总论]; Q98 [人类学];
学科分类号
03 ; 0303 ; 030303 ; 04 ; 0402 ;
摘要
Fitting generalised linear models (GLMs) with more than one predictor has become the standard method of analysis in evolutionary and behavioural research. Often, GLMs are used for exploratory data analysis, where one starts with a complex full model including interaction terms and then simplifies by removing non-significant terms. While this approach can be useful, it is problematic if significant effects are interpreted as if they arose from a single a priori hypothesis test. This is because model selection involves cryptic multiple hypothesis testing, a fact that has only rarely been acknowledged or quantified. We show that the probability of finding at least one 'significant' effect is high, even if all null hypotheses are true (e.g. 40% when starting with four predictors and their two-way interactions). This probability is close to theoretical expectations when the sample size (N) is large relative to the number of predictors including interactions (k). In contrast, type I error rates strongly exceed even those expectations when model simplification is applied to models that are over-fitted before simplification (low N/k ratio). The increase in false-positive results arises primarily from an overestimation of effect sizes among significant predictors, leading to upward-biased effect sizes that often cannot be reproduced in follow-up studies ('the winner's curse'). Despite having their own problems, full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone. We favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of non-significant results.
引用
收藏
页码:47 / 55
页数:9
相关论文
共 43 条
[1]  
Aiken LS., 1991, MULTIPLE REGRESSION
[2]   Null hypothesis testing: Problems, prevalence, and an alternative [J].
Anderson, DR ;
Burnham, KP ;
Thompson, WL .
JOURNAL OF WILDLIFE MANAGEMENT, 2000, 64 (04) :912-923
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Forward selection of explanatory variables [J].
Blanchet, F. Guillaume ;
Legendre, Pierre ;
Borcard, Daniel .
ECOLOGY, 2008, 89 (09) :2623-2632
[5]  
Burnham KP., 2002, MODEL SELECTION MULT, DOI DOI 10.1007/B97636
[6]   MODEL UNCERTAINTY, DATA MINING AND STATISTICAL-INFERENCE [J].
CHATFIELD, C .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1995, 158 :419-466
[7]  
Crawley M. J., 2007, R BOOK, DOI DOI 10.1002/9780470515075
[8]   BACKWARD, FORWARD AND STEPWISE AUTOMATED SUBSET-SELECTION ALGORITHMS - FREQUENCY OF OBTAINING AUTHENTIC AND NOISE VARIABLES [J].
DERKSEN, S ;
KESELMAN, HJ .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1992, 45 :265-282
[9]   Developing multiple hypotheses in behavioral ecology [J].
Dochtermann, Ned A. ;
Jenkins, Stephen H. .
BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY, 2011, 65 (01) :37-45
[10]  
Field A., 2013, Discovering statistics using IBM SPSS statistics, DOI DOI 10.1016/B978-012691360-6/50012-4