Using principal components for estimating logistic regression with high-dimensional multicollinear data

被引:133
|
作者
Aguilera, AM [1 ]
Escabias, M [1 ]
Valderrama, MJ [1 ]
机构
[1] Univ Granada, Dept Stat & OR, Granada, Spain
关键词
logistic regression; multicollinearity; principal components;
D O I
10.1016/j.csda.2005.03.011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The logistic regression model is used to predict a binary response variable in terms of a set of explicative ones. The estimation of the model parameters is not too accurate and their interpretation in terms of odds ratios may be erroneous, when there is multicollinearity (high dependence) among the predictors. Other important problem is the great number of explicative variables usually needed to explain the response. In order to improve the estimation of the logistic model parameters under multicollinearity and to reduce the dimension of the problem with continuous covariates, it is proposed to use as covariates of the logistic model a reduced set of optimum principal components of the original predictors. Finally, the performance of the proposed principal component logistic regression model is analyzed by developing a simulation study where different methods for selecting the optimum principal components are compared. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:1905 / 1924
页数:20
相关论文
共 50 条
  • [21] CONSISTENCY OF AIC AND BIC IN ESTIMATING THE NUMBER OF SIGNIFICANT COMPONENTS IN HIGH-DIMENSIONAL PRINCIPAL COMPONENT ANALYSIS
    Bai, Zhidong
    Choi, Kwok Pui
    Fujikoshi, Yasunori
    ANNALS OF STATISTICS, 2018, 46 (03): : 1050 - 1076
  • [22] The cross-validated AUC for MCP-logistic regression with high-dimensional data
    Jiang, Dingfeng
    Huang, Jian
    Zhang, Ying
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2013, 22 (05) : 505 - 518
  • [23] High-dimensional pseudo-logistic regression and classification with applications to gene expression data
    Zhang, Chunming
    Fu, Haoda
    Jiang, Yuan
    Yu, Tao
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 452 - 470
  • [24] A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations
    Arashi, Mohammad
    Norouzirad, Mina
    Roozbeh, Mahdi
    Khan, Naushad Mamode
    MATHEMATICS, 2021, 9 (23)
  • [25] Identification of outlying and influential data with principal components regression estimation in binary logistic regression
    Ozkale, M. Revan
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2021, 50 (03) : 609 - 630
  • [26] Efficient posterior sampling for high-dimensional imbalanced logistic regression
    Sen, Deborshee
    Sachs, Matthias
    Lu, Jianfeng
    Dunson, David B.
    BIOMETRIKA, 2020, 107 (04) : 1005 - 1012
  • [27] A ridge penalized principal-components approach based on heritability for high-dimensional data
    Wang, Yuanjia
    Fang, Yixin
    Jin, Man
    HUMAN HEREDITY, 2007, 64 (03) : 182 - 191
  • [28] Penalized logistic regression for high-dimensional DNA methylation data with case-control studies
    Sun, Hokeun
    Wang, Shuang
    BIOINFORMATICS, 2012, 28 (10) : 1368 - 1375
  • [29] HIGH-DIMENSIONAL ANALYSIS OF SEMIDEFINITE RELAXATIONS FOR SPARSE PRINCIPAL COMPONENTS
    Amini, Arash A.
    Wainwright, Martin J.
    ANNALS OF STATISTICS, 2009, 37 (5B): : 2877 - 2921
  • [30] High-dimensional analysis of semidefinite relaxations for sparse principal components
    Amini, Arash A.
    Wainwright, Martin J.
    2008 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-6, 2008, : 2454 - 2458