Multicollinearity and misleading statistical results

被引:1323
作者
Kim, Jong Hae [1 ]
机构
[1] Daegu Catholic Univ, Dept Anesthesiol & Pain Med, Sch Med, 33 Duryugongwon Ro 17 Gil, Daegu 42472, South Korea
关键词
Biomedical research; Biostatistics; Multivariable analysis; Regression; Statistical bias; Statistical data analysis;
D O I
10.4097/kja.19087
中图分类号
R614 [麻醉学];
学科分类号
100217 ;
摘要
Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic tools of multicollinearity include the variance inflation factor (VIF), condition index and condition number, and variance decomposition proportion (VDP). The multicollinearity can be expressed by the coefficient of determination (R-h(2)) of a multiple regression model with one explanatory variable (X-h) as the moders response variable and the others (X-i [i not equal h]) as its explanatory variables. The variance (sigma(2)(h)) of the regression coefficients constituting the final regression model are proportional to the VIP (1/1-R-h(2)). Hence, an in- crease in R-h(2) (strong multicollinearity) increases sigma(2)(h). The larger sigma(2)(h) produces unreliable probability values and confidence intervals of the regression coefficients. The square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables is referred to as the condition index. The condition number is the maximum condition index. Multicollinearity is present when the VIP is higher than 5 to 10 or the condition indices are higher than 10 to 30. However, they cannot indicate multicollinear explanatory variables. VDPs obtained from the eigenvectors can identify the multicollinear variables by showing the extent of the inflation of sigma(2)(h) according to each condition index. When two or more VDPs, which correspond to a common condition index higher than 10 to 30, are higher than 0.8 to 0.9, their associated explanatory variables are multicollinear. Excluding multicollinear explanatory variables leads to statistically stable multiple regression models.
引用
收藏
页码:558 / 569
页数:12
相关论文
共 6 条
[1]   Liver graft hyperperfusion in the early postoperative period promotes hepatic regeneration 2 weeks after living donor liver transplantation A prospective observational cohort study [J].
Byun, Sung Hye ;
Yang, Hae Soo ;
Kim, Jong Hae .
MEDICINE, 2016, 95 (46)
[2]  
Kim JH, 2017, KOREAN J ANESTHESIOL, V70, P511, DOI 10.4097/kjae.2017.70.5.511
[3]  
Liao D, 2012, SURV METHODOL, V38, P53
[4]   Ridge regression [J].
McDonald, Gary C. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2009, 1 (01) :93-100
[5]   Role of shear stress and immune responses in liver regeneration after a partial hepatectomy [J].
Sato, Y ;
Tsukada, K ;
Hatakeyama, K .
SURGERY TODAY-THE JAPANESE JOURNAL OF SURGERY, 1999, 29 (01) :1-9
[6]  
Vatcheva Kristina P, 2016, Epidemiology (Sunnyvale), V6