Using principal components for estimating logistic regression with high-dimensional multicollinear data

被引:133
|
作者
Aguilera, AM [1 ]
Escabias, M [1 ]
Valderrama, MJ [1 ]
机构
[1] Univ Granada, Dept Stat & OR, Granada, Spain
关键词
logistic regression; multicollinearity; principal components;
D O I
10.1016/j.csda.2005.03.011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The logistic regression model is used to predict a binary response variable in terms of a set of explicative ones. The estimation of the model parameters is not too accurate and their interpretation in terms of odds ratios may be erroneous, when there is multicollinearity (high dependence) among the predictors. Other important problem is the great number of explicative variables usually needed to explain the response. In order to improve the estimation of the logistic model parameters under multicollinearity and to reduce the dimension of the problem with continuous covariates, it is proposed to use as covariates of the logistic model a reduced set of optimum principal components of the original predictors. Finally, the performance of the proposed principal component logistic regression model is analyzed by developing a simulation study where different methods for selecting the optimum principal components are compared. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:1905 / 1924
页数:20
相关论文
共 50 条
  • [1] Bayesian regression based on principal components for high-dimensional data
    Lee, Jaeyong
    Oh, Hee-Seok
    JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 117 : 175 - 192
  • [2] Estimating motorized travel mode choice using classifiers: An application for high-dimensional multicollinear data
    Lindner, Anabele
    Pitombo, Cira Souza
    Cunha, Andre Luiz
    TRAVEL BEHAVIOUR AND SOCIETY, 2017, 6 : 100 - 109
  • [4] Regression methods for high dimensional multicollinear data
    Aucott, LS
    Garthwaite, PH
    Currall, J
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2000, 29 (04) : 1021 - 1037
  • [5] On inference in high-dimensional logistic regression models with separated data
    Lewis, R. M.
    Battey, H. S.
    BIOMETRIKA, 2024, 111 (03)
  • [6] Classification of High-Dimensional Data with Ensemble of Logistic Regression Models
    Lim, Noha
    Ahn, Hongshik
    Moon, Hojin
    Chen, James J.
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2010, 20 (01) : 160 - 171
  • [7] Using synthetic data and dimensionality reduction in high-dimensional classification via logistic regression
    Zarei, Shaho
    Mohammadpour, Adel
    COMPUTATIONAL METHODS FOR DIFFERENTIAL EQUATIONS, 2019, 7 (04): : 626 - 634
  • [8] High-Dimensional Classification by Sparse Logistic Regression
    Abramovich, Felix
    Grinshtein, Vadim
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (05) : 3068 - 3079
  • [9] The Impact of Regularization on High-dimensional Logistic Regression
    Salehi, Fariborz
    Abbasi, Ehsan
    Hassibi, Babak
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data
    Alharthi, Aiedh Mrisi
    Lee, Muhammad Hisyam
    Algamal, Zakariya Yahya
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2022, 18 (02) : 40 - 54