Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis

被引:3
作者
Thongsri, Thidarat [1 ]
Samart, Klairung [1 ]
机构
[1] Prince Songkla Univ, Fac Sci, Div Computat Sci, Stat & Applicat Res Unit, Hat Yai, Thailand
关键词
missing data; imputation method; composite method; multiple linear regression; HOT DECK IMPUTATION;
D O I
10.1134/S1995080222140323
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Missing data is a common issue in many domains of study. If this issue is disregarded, the erroneous conclusion may be reached. This study's objective is to develop and compared the efficiency of eight imputation methods: hot deck imputation (HD), k-nearest neighbors imputation (KNN), stochastic regression, imputation (SR), predictive mean matching imputation (PMM), random forest imputation (RF), stochastic regression random forest with equivalent weight imputation (SREW), k-nearest random forest with equivalent weight imputation (KREW), and k-nearest stochastic regression and random forest with equivalent weight imputation (KSREW). In this study, the simulation was run using sample sizes of 30, 60, 100, and 150, and missing percentages of 10%, 20%, 30%, and 40%. The average mean square error (AMSE) was used to compare efficiency. The results reveal that the proposed composite approaches outperformed the single ones, particularly a three-component method called KSREW. Increasing the number of components to a four-component method, on the other hand, has no effect on imputation performance.
引用
收藏
页码:3390 / 3399
页数:10
相关论文
共 50 条
  • [21] Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data
    Taylor, Sandra
    Ponzini, Matthew
    Wilson, Machelle
    Kim, Kyoungmi
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [22] Framework for regression-based missing data imputation methods in on-line MSPC
    Arteaga, F
    Ferrer, A
    JOURNAL OF CHEMOMETRICS, 2005, 19 (08) : 439 - 447
  • [23] Wind power prediction with missing data using Gaussian process regression and multiple imputation
    Liu, Tianhong
    Wei, Haikun
    Zhang, Kanjian
    APPLIED SOFT COMPUTING, 2018, 71 : 905 - 916
  • [24] Missing data imputation using classification and regression trees
    Chen, Cheng-Yang
    Chang, Yu-Wei
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [25] REGRESSION IMPUTATION OF MISSING VALUES IN LONGITUDINAL DATA SETS
    SCHNEIDERMAN, ED
    KOWALSKI, CJ
    WILLIS, SM
    INTERNATIONAL JOURNAL OF BIO-MEDICAL COMPUTING, 1993, 32 (02): : 121 - 133
  • [26] Handling missing data: analysis of a challenging data set using multiple imputation
    Pampaka, Maria
    Hutcheson, Graeme
    Williams, Julian
    INTERNATIONAL JOURNAL OF RESEARCH & METHOD IN EDUCATION, 2016, 39 (01) : 19 - 37
  • [27] Regression imputation in the functional linear model with missing values in the response
    Crambes, Christophe
    Henchiri, Yousri
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2019, 201 : 103 - 119
  • [28] Imputation of missing longitudinal data: a comparison of methods
    Engels, JM
    Diehr, P
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (10) : 968 - 976
  • [29] Imputation methods for missing data for polygenic models
    Brooke Fridley
    Kari Rabe
    Mariza de Andrade
    BMC Genetics, 4
  • [30] Efficient Imputation Methods to Handle Missing Data in Sample Surveys
    Singh, G. N.
    Jaiswal, Ashok K.
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2022, 16 (03)