Violating the normality assumption may be the lesser of two evils

被引:322
作者
Knief, Ulrich [1 ]
Forstmeier, Wolfgang [2 ]
机构
[1] Ludwig Maximilian Univ Munich, Fac Biol, Div Evolutionary Biol, Grosshaderner Str 2, D-82152 Planegg Martinsried, Germany
[2] Max Planck Inst Ornithol, Dept Behav Ecol & Evolutionary Genet, D-82319 Seewiesen, Germany
关键词
Hypothesis testing; Linear model; Normality; Regression; CORRELATION-COEFFICIENT; MIXED MODELS; COUNT DATA; REGRESSION; NONNORMALITY; ROBUSTNESS; ECOLOGY; TESTS; OVERDISPERSION; VARIANCE;
D O I
10.3758/s13428-021-01587-5
中图分类号
B841 [心理学研究方法];
学科分类号
040201 ;
摘要
When data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that p values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data.
引用
收藏
页码:2576 / 2590
页数:15
相关论文
共 70 条
  • [1] Estimating the reproducibility of psychological science
    Aarts, Alexander A.
    Anderson, Joanna E.
    Anderson, Christopher J.
    Attridge, Peter R.
    Attwood, Angela
    Axt, Jordan
    Babel, Molly
    Bahnik, Stepan
    Baranski, Erica
    Barnett-Cowan, Michael
    Bartmess, Elizabeth
    Beer, Jennifer
    Bell, Raoul
    Bentley, Heather
    Beyan, Leah
    Binion, Grace
    Borsboom, Denny
    Bosch, Annick
    Bosco, Frank A.
    Bowman, Sara D.
    Brandt, Mark J.
    Braswell, Erin
    Brohmer, Hilmar
    Brown, Benjamin T.
    Brown, Kristina
    Bruening, Jovita
    Calhoun-Sauls, Ann
    Callahan, Shannon P.
    Chagnon, Elizabeth
    Chandler, Jesse
    Chartier, Christopher R.
    Cheung, Felix
    Christopherson, Cody D.
    Cillessen, Linda
    Clay, Russ
    Cleary, Hayley
    Cloud, Mark D.
    Cohn, Michael
    Cohoon, Johanna
    Columbus, Simon
    Cordes, Andreas
    Costantini, Giulio
    Alvarez, Leslie D. Cramblet
    Cremata, Ed
    Crusius, Jan
    DeCoster, Jamie
    DeGaetano, Michelle A.
    Della Penna, Nicolas
    den Bezemer, Bobby
    Deserno, Marie K.
    [J]. SCIENCE, 2015, 349 (6251)
  • [2] Robustness to nonnormality of regression F-tests
    Ali, MM
    Sharma, SC
    [J]. JOURNAL OF ECONOMETRICS, 1996, 71 (1-2) : 175 - 205
  • [3] [Anonymous], 1982, Florida Journal of Educational Research
  • [4] [Anonymous], 2002, The statistical sleuth a course in methods of data analysis
  • [5] Mixed Models Offer No Freedom from Degrees Of Freedom
    Arnqvist, Goran
    [J]. TRENDS IN ECOLOGY & EVOLUTION, 2020, 35 (04) : 329 - 335
  • [6] Random effects structure for confirmatory hypothesis testing: Keep it maximal
    Barr, Dale J.
    Levy, Roger
    Scheepers, Christoph
    Tily, Harry J.
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 2013, 68 (03) : 255 - 278
  • [7] Fitting Linear Mixed-Effects Models Using lme4
    Bates, Douglas
    Maechler, Martin
    Bolker, Benjamin M.
    Walker, Steven C.
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2015, 67 (01): : 1 - 48
  • [8] Testing the Significance of a Correlation With Nonnormal Data: Comparison of Pearson, Spearman, Transformation, and Resampling Approaches
    Bishara, Anthony J.
    Hittner, James B.
    [J]. PSYCHOLOGICAL METHODS, 2012, 17 (03) : 399 - 417
  • [9] Bliss C.I., 1967, Statistics in Biology
  • [10] Generalized linear mixed models: a practical guide for ecology and evolution
    Bolker, Benjamin M.
    Brooks, Mollie E.
    Clark, Connie J.
    Geange, Shane W.
    Poulsen, John R.
    Stevens, M. Henry H.
    White, Jada-Simone S.
    [J]. TRENDS IN ECOLOGY & EVOLUTION, 2009, 24 (03) : 127 - 135