Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data

被引:2
|
作者
Alharthi, Aiedh Mrisi [1 ,2 ]
Lee, Muhammad Hisyam [1 ]
Algamal, Zakariya Yahya [3 ]
机构
[1] Univ Teknol Malaysia, Dept Math Sci, Skudai, Malaysia
[2] Taif Univ, Dept Math, At Taif, Saudi Arabia
[3] Univ Mosul, Dept Stat & Informat, Mosul, Iraq
关键词
high-dimensional data; feature selection; missing data; multiple imputations; penalized regression; MULTIPLE IMPUTATION; VARIABLE SELECTION; ALGORITHM; REGULARIZATION;
D O I
10.3991/ijoe.v18i02.25047
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
without adequate handling of missing values may lead to inconsistent and biased estimates. Despite multiple imputations becoming a widely used approach in handling missing data, manuscript researchers generally encounter missing data in their respective studies. In high-dimensional data, penalized regression is a popular technique for performing feature selection and coefficient estimation simultaneously. However, one of the most vital issues with high-dimensional data is that it often contains large quantities of missing data that common multiple imputation approaches may not work correctly. Therefore, this study uses imputations penalized regression models as an extension of the penalized methods to improve the performance and impute missing values in high-dimensional data. The method was applied to real-life high dimensional datasets for the different number of features, sample sizes, and missing dataset rates to evaluate its efficiency. The method was also compared with other existing imputation penalized methods for high-dimensional data. The comparative experimental results indicate that the proposed method outperforms its competitors by achieving higher sensitivity, specificity, and classification accuracy values.
引用
收藏
页码:40 / 54
页数:15
相关论文
共 50 条
  • [31] Using principal components for estimating logistic regression with high-dimensional multicollinear data
    Aguilera, AM
    Escabias, M
    Valderrama, MJ
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (08) : 1905 - 1924
  • [32] Lasso penalized model selection criteria for high-dimensional multivariate linear regression analysis
    Katayama, Shota
    Imori, Shinpei
    JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 132 : 138 - 150
  • [33] Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model
    Wang, Ning
    Zhang, Xin
    Mai, Qing
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [34] Missing Data Imputation with High-Dimensional Data
    Brini, Alberto
    van den Heuvel, Edwin R.
    AMERICAN STATISTICIAN, 2024, 78 (02): : 240 - 252
  • [35] Logistic regression with missing values in the covariates
    1600, American Statistical Assoc, Alexandria, VA, USA (37):
  • [36] IMPUTED FACTOR REGRESSION FOR HIGH-DIMENSIONAL BLOCK-WISE MISSING DATA
    Zhang, Yanqing
    Tang, Niansheng
    Qu, Annie
    STATISTICA SINICA, 2020, 30 (02) : 631 - 651
  • [37] Inference for the case probability in high-dimensional logistic regression
    Guo, Zijian
    Rakshit, Prabrisha
    Herman, Daniel S.
    Chen, Jinbo
    Journal of Machine Learning Research, 2021, 22
  • [38] Weak Signals in High-Dimensional Logistic Regression Models
    Reangsephet, Orawan
    Lisawadi, Supranee
    Ahmed, Syed Ejaz
    PROCEEDINGS OF THE THIRTEENTH INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING MANAGEMENT, VOL 1, 2020, 1001 : 121 - 133
  • [39] Robust adaptive LASSO in high-dimensional logistic regression
    Basu, Ayanendranath
    Ghosh, Abhik
    Jaenada, Maria
    Pardo, Leandro
    STATISTICAL METHODS AND APPLICATIONS, 2024,
  • [40] Revisiting the Problem of Missing Values in High-Dimensional Data and Feature Selection Effect
    Elia, Marina G.
    Duan, Wenting
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT I, AIAI 2024, 2024, 711 : 201 - 213