Using principal components for estimating logistic regression with high-dimensional multicollinear data

被引:133
|
作者
Aguilera, AM [1 ]
Escabias, M [1 ]
Valderrama, MJ [1 ]
机构
[1] Univ Granada, Dept Stat & OR, Granada, Spain
关键词
logistic regression; multicollinearity; principal components;
D O I
10.1016/j.csda.2005.03.011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The logistic regression model is used to predict a binary response variable in terms of a set of explicative ones. The estimation of the model parameters is not too accurate and their interpretation in terms of odds ratios may be erroneous, when there is multicollinearity (high dependence) among the predictors. Other important problem is the great number of explicative variables usually needed to explain the response. In order to improve the estimation of the logistic model parameters under multicollinearity and to reduce the dimension of the problem with continuous covariates, it is proposed to use as covariates of the logistic model a reduced set of optimum principal components of the original predictors. Finally, the performance of the proposed principal component logistic regression model is analyzed by developing a simulation study where different methods for selecting the optimum principal components are compared. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:1905 / 1924
页数:20
相关论文
共 50 条
  • [41] Robust and sparse estimation methods for high-dimensional linear and logistic regression
    Kurnaz, Fatma Sevinc
    Hoffmann, Irene
    Filzmoser, Peter
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 172 : 211 - 222
  • [42] Minimax Sparse Logistic Regression for Very High-Dimensional Feature Selection
    Tan, Mingkui
    Tsang, Ivor W.
    Wang, Li
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (10) : 1609 - 1622
  • [43] Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data
    Sill, Martin
    Saadati, Maral
    Benner, Axel
    BIOINFORMATICS, 2015, 31 (16) : 2683 - 2690
  • [44] STATISTICAL INFERENCE FOR GENETIC RELATEDNESS BASED ON HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Ma, Rong
    Guo, Zijian
    Cai, T. Tony
    Li, Hongzhe
    STATISTICA SINICA, 2024, 34 (02) : 1023 - 1043
  • [45] Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model
    Kim, Hyunjin
    Lee, Eun Ryung
    Park, Seyoung
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [46] Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression
    Insolia, Luca
    Kenney, Ana
    Calovi, Martina
    Chiaromonte, Francesca
    STATS, 2021, 4 (03): : 665 - 681
  • [47] SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
    Yadlowsky, Steve
    Yun, Taedong
    McLean, Cory
    D'Amour, Alexander
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [48] Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration
    Islam, Shofiqul
    Anand, Sonia
    Hamid, Jemila
    Thabane, Lehana
    Beyene, Joseph
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2017, 16 (03) : 199 - 216
  • [49] HIGH-DIMENSIONAL ISING MODEL SELECTION USING l1-REGULARIZED LOGISTIC REGRESSION
    Ravikumar, Pradeep
    Wainwright, Martin J.
    Lafferty, John D.
    ANNALS OF STATISTICS, 2010, 38 (03): : 1287 - 1319
  • [50] MWPCR: Multiscale Weighted Principal Component Regression for High-Dimensional Prediction
    Zhu, Hongtu
    Shen, Dan
    Peng, Xuewei
    Liu, Leo Yufeng
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (519) : 1009 - 1021