Linear regression and the normality assumption

被引:333
|
作者
Schmidt, Amand F. [1 ,2 ,3 ]
Finan, Chris [1 ]
机构
[1] UCL, Inst Cardiovasc Sci, Fac Populat Hlth, London WC1E 6BT, England
[2] Univ Groningen, Groningen Res Inst Pharm, Groningen, Netherlands
[3] Univ Med Ctr Utrecht, Div Heart & Lungs, Dept Cardiol, Utrecht, Netherlands
关键词
Epidemiological methods; Bias; Linear regression; Modeling assumptions; Statistical inference; Big data;
D O I
10.1016/j.jclinepi.2017.12.006
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objectives: Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. Study Design and Setting: Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. Results: Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. Conclusion: Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:146 / 151
页数:6
相关论文
共 50 条
  • [21] THE NEEDLESS ASSUMPTION OF NORMALITY IN PEARSON-TAU
    NEFZGER, MD
    DRASGOW, J
    AMERICAN PSYCHOLOGIST, 1957, 12 (05) : 623 - 625
  • [22] The residual normality assumption and models of cognition in schizophrenia
    Condray, R
    Steinhauer, SR
    BEHAVIORAL AND BRAIN SCIENCES, 2002, 25 (06) : 753 - +
  • [23] EFFECTS OF VIOLATING NORMALITY ASSUMPTION UNDERLYING R
    ZELLER, RA
    LEVINE, ZH
    SOCIOLOGICAL METHODS & RESEARCH, 1974, 2 (04) : 511 - 519
  • [24] Evaluation of the Normality Assumption in Meta-Analyses
    Wang, Chia-Chun
    Lee, Wen-Chung
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2020, 189 (03) : 235 - 242
  • [25] COMMENT ON THE NEEDLESS ASSUMPTION OF NORMALITY IN PEARSONS R
    MILHOLLAND, JE
    AMERICAN PSYCHOLOGIST, 1958, 13 (09) : 544 - 545
  • [26] Simultaneous multivariate tests under the normality assumption
    Park, Hyo-Il
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2020, 49 (07) : 1886 - 1897
  • [27] Testing the normality assumption in transition analysis.
    Konigsberg, L. W.
    Frankenberg, S. R.
    AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 2009, : 168 - 168
  • [28] Assumption Lean Regression
    Berk, Richard
    Buja, Andreas
    Brown, Lawrence
    George, Edward
    Kuchibhotla, Arun Kumar
    Su, Weijie
    Zhao, Linda
    AMERICAN STATISTICIAN, 2021, 75 (01): : 76 - 84
  • [29] Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality
    Demler, Olga V.
    Pencina, Michael J.
    D'Agostino, Ralph B.
    STATISTICS IN MEDICINE, 2011, 30 (12) : 1410 - 1418
  • [30] Asymptotic Normality of a Simple Linear EV Regression Model with Martingale Difference Errors
    Fan, Guo-Liang
    Chen, Tian-Heng
    FILOMAT, 2014, 28 (09) : 1817 - 1825