Linear regression and the normality assumption

被引:336
作者
Schmidt, Amand F. [1 ,2 ,3 ]
Finan, Chris [1 ]
机构
[1] UCL, Inst Cardiovasc Sci, Fac Populat Hlth, London WC1E 6BT, England
[2] Univ Groningen, Groningen Res Inst Pharm, Groningen, Netherlands
[3] Univ Med Ctr Utrecht, Div Heart & Lungs, Dept Cardiol, Utrecht, Netherlands
关键词
Epidemiological methods; Bias; Linear regression; Modeling assumptions; Statistical inference; Big data;
D O I
10.1016/j.jclinepi.2017.12.006
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objectives: Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. Study Design and Setting: Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. Results: Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. Conclusion: Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:146 / 151
页数:6
相关论文
共 50 条
  • [41] DISTRIBUTED LINEAR REGRESSION BY AVERAGING
    Dobriban, Edgar
    Sheng, Yue
    ANNALS OF STATISTICS, 2021, 49 (02) : 918 - 943
  • [42] On a class of linear regression methods
    Wang, Ying-Ao
    Huang, Qin
    Yao, Zhigang
    Zhang, Ye
    JOURNAL OF COMPLEXITY, 2024, 82
  • [43] CURRENT STATUS LINEAR REGRESSION
    Groeneboom, Piet
    Hendrickx, Kim
    ANNALS OF STATISTICS, 2018, 46 (04) : 1415 - 1444
  • [44] Regularization of linear regression problems
    Galkin V.Ya.
    Mechenov A.S.
    Computational Mathematics and Modeling, 2002, 13 (2) : 186 - 200
  • [45] On the asymptotic normality of hierarchical mixtures-of-experts for generalized linear models
    Jiang, WX
    Tanner, MA
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2000, 46 (03) : 1005 - 1013
  • [46] Testing fuzzy linear hypotheses in linear regression models
    Bernhard F. Arnold
    Oke Gerke
    Metrika, 2003, 57 : 81 - 95
  • [47] Testing fuzzy linear hypotheses in linear regression models
    Arnold, BF
    Gerke, O
    METRIKA, 2003, 57 (01) : 81 - 95
  • [48] The impact of natural constraints in linear regression of log transformed response variables
    Fortin, Mathieu
    FORESTRY, 2024,
  • [49] New heteroskedasticity-robust standard errors for the linear regression model
    Cribari-Neto, Francisco
    Lima, Maria da Gloria A.
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2014, 28 (01) : 83 - 95
  • [50] Are Used Cars More Sustainable? Price Prediction Based on Linear Regression
    Alhakamy, A'aeshah
    Alhowaity, Areej
    Alatawi, Anwar Abdullah
    Alsaadi, Hadeel
    SUSTAINABILITY, 2023, 15 (02)