Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data

被引:2
|
作者
Alharthi, Aiedh Mrisi [1 ,2 ]
Lee, Muhammad Hisyam [1 ]
Algamal, Zakariya Yahya [3 ]
机构
[1] Univ Teknol Malaysia, Dept Math Sci, Skudai, Malaysia
[2] Taif Univ, Dept Math, At Taif, Saudi Arabia
[3] Univ Mosul, Dept Stat & Informat, Mosul, Iraq
关键词
high-dimensional data; feature selection; missing data; multiple imputations; penalized regression; MULTIPLE IMPUTATION; VARIABLE SELECTION; ALGORITHM; REGULARIZATION;
D O I
10.3991/ijoe.v18i02.25047
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
without adequate handling of missing values may lead to inconsistent and biased estimates. Despite multiple imputations becoming a widely used approach in handling missing data, manuscript researchers generally encounter missing data in their respective studies. In high-dimensional data, penalized regression is a popular technique for performing feature selection and coefficient estimation simultaneously. However, one of the most vital issues with high-dimensional data is that it often contains large quantities of missing data that common multiple imputation approaches may not work correctly. Therefore, this study uses imputations penalized regression models as an extension of the penalized methods to improve the performance and impute missing values in high-dimensional data. The method was applied to real-life high dimensional datasets for the different number of features, sample sizes, and missing dataset rates to evaluate its efficiency. The method was also compared with other existing imputation penalized methods for high-dimensional data. The comparative experimental results indicate that the proposed method outperforms its competitors by achieving higher sensitivity, specificity, and classification accuracy values.
引用
收藏
页码:40 / 54
页数:15
相关论文
共 50 条
  • [1] Penalized logistic regression for high-dimensional DNA methylation data with case-control studies
    Sun, Hokeun
    Wang, Shuang
    BIOINFORMATICS, 2012, 28 (10) : 1368 - 1375
  • [2] Ensemble of penalized logistic models for classification of high-dimensional data
    Ijaz, Musarrat
    Asghar, Zahid
    Gul, Asma
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (07) : 2072 - 2088
  • [3] DOUBLY PENALIZED ESTIMATION IN ADDITIVE REGRESSION WITH HIGH-DIMENSIONAL DATA
    Tan, Zhiqiang
    Zhang, Cun-Hui
    ANNALS OF STATISTICS, 2019, 47 (05): : 2567 - 2600
  • [4] A MODEL OF DOUBLE DESCENT FOR HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Deng, Zeyu
    Kammoun, Abla
    Thrampoulidis, Christos
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4267 - 4271
  • [5] Penalized logistic regression based on L1/2 penalty for high-dimensional DNA methylation data
    Jiang, Hong-Kun
    Liang, Yong
    TECHNOLOGY AND HEALTH CARE, 2020, 28 : S161 - S171
  • [6] Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification
    Algamal, Zakariya Yahya
    Lee, Muhammad Hisyam
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (23) : 9326 - 9332
  • [7] Penalized weighted smoothed quantile regression for high-dimensional longitudinal data
    Song, Yanan
    Han, Haohui
    Fu, Liya
    Wang, Ting
    STATISTICS IN MEDICINE, 2024, 43 (10) : 2007 - 2042
  • [8] Penalized Gaussian Process Regression and Classification for High-Dimensional Nonlinear Data
    Yi, G.
    Shi, J. Q.
    Choi, T.
    BIOMETRICS, 2011, 67 (04) : 1285 - 1294
  • [9] On inference in high-dimensional logistic regression models with separated data
    Lewis, R. M.
    Battey, H. S.
    BIOMETRIKA, 2024, 111 (03)
  • [10] Classification of High-Dimensional Data with Ensemble of Logistic Regression Models
    Lim, Noha
    Ahn, Hongshik
    Moon, Hojin
    Chen, James J.
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2010, 20 (01) : 160 - 171