Using principal components for estimating logistic regression with high-dimensional multicollinear data

被引:133
作者
Aguilera, AM [1 ]
Escabias, M [1 ]
Valderrama, MJ [1 ]
机构
[1] Univ Granada, Dept Stat & OR, Granada, Spain
关键词
logistic regression; multicollinearity; principal components;
D O I
10.1016/j.csda.2005.03.011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The logistic regression model is used to predict a binary response variable in terms of a set of explicative ones. The estimation of the model parameters is not too accurate and their interpretation in terms of odds ratios may be erroneous, when there is multicollinearity (high dependence) among the predictors. Other important problem is the great number of explicative variables usually needed to explain the response. In order to improve the estimation of the logistic model parameters under multicollinearity and to reduce the dimension of the problem with continuous covariates, it is proposed to use as covariates of the logistic model a reduced set of optimum principal components of the original predictors. Finally, the performance of the proposed principal component logistic regression model is analyzed by developing a simulation study where different methods for selecting the optimum principal components are compared. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:1905 / 1924
页数:20
相关论文
共 19 条
  • [1] AUCOTT LS, 1984, COMM STAT COMPUT SIM, V29, P1021
  • [2] Basilevsky A., 1994, Statistical Factor Analysis and Related Methods: Theory and Applications
  • [3] PLS generalised linear regression
    Bastien, P
    Vinzi, VE
    Tenenhaus, M
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (01) : 17 - 46
  • [4] Dayal BS, 1997, J CHEMOMETR, V11, P73, DOI 10.1002/(SICI)1099-128X(199701)11:1<73::AID-CEM435>3.0.CO
  • [5] 2-#
  • [6] GUNST RF, 1977, J AM STAT ASS THEORY, V359, P616
  • [7] ANALYSIS AND SELECTION OF VARIABLES IN LINEAR-REGRESSION
    HOCKING, RR
    [J]. BIOMETRICS, 1976, 32 (01) : 1 - 49
  • [8] Hosmer D. W., 1989, APPL LOGISTIC REGRES, DOI DOI 10.1097/00019514-200604000-00003
  • [9] Hosmer DW, 1997, STAT MED, V16, P965
  • [10] Mansfield E.R., 1977, Applied statistics, V26, P34, DOI DOI 10.2307/2346865