Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis

被引:4
作者
Thongsri, Thidarat [1 ]
Samart, Klairung [1 ]
机构
[1] Prince Songkla Univ, Fac Sci, Div Computat Sci, Stat & Applicat Res Unit, Hat Yai, Thailand
关键词
missing data; imputation method; composite method; multiple linear regression; HOT DECK IMPUTATION;
D O I
10.1134/S1995080222140323
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Missing data is a common issue in many domains of study. If this issue is disregarded, the erroneous conclusion may be reached. This study's objective is to develop and compared the efficiency of eight imputation methods: hot deck imputation (HD), k-nearest neighbors imputation (KNN), stochastic regression, imputation (SR), predictive mean matching imputation (PMM), random forest imputation (RF), stochastic regression random forest with equivalent weight imputation (SREW), k-nearest random forest with equivalent weight imputation (KREW), and k-nearest stochastic regression and random forest with equivalent weight imputation (KSREW). In this study, the simulation was run using sample sizes of 30, 60, 100, and 150, and missing percentages of 10%, 20%, 30%, and 40%. The average mean square error (AMSE) was used to compare efficiency. The results reveal that the proposed composite approaches outperformed the single ones, particularly a three-component method called KSREW. Increasing the number of components to a four-component method, on the other hand, has no effect on imputation performance.
引用
收藏
页码:3390 / 3399
页数:10
相关论文
共 26 条
[1]   Estimating extremely large amounts of missing precipitation data [J].
Aguilera, Hector ;
Guardiola-Albert, Carolina ;
Serrano-Hidalgo, Carmen .
JOURNAL OF HYDROINFORMATICS, 2020, 22 (03) :578-592
[2]   A Review of Hot Deck Imputation for Survey Non-response [J].
Andridge, Rebecca R. ;
Little, Roderick J. A. .
INTERNATIONAL STATISTICAL REVIEW, 2010, 78 (01) :40-64
[3]   Random forest analysis of two household surveys can identify important predictors of migration in Bangladesh [J].
Best, Kelsea B. ;
Gilligan, Jonathan M. ;
Baroud, Hiba ;
Carrico, Amanda R. ;
Donato, Katharine M. ;
Ackerly, Brooke A. ;
Mallick, Bishawjit .
JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2021, 4 (01) :77-100
[4]  
Chaovanaphan P., 2017, J APPL SCI, V16, P60, DOI [10.14416/j.appsci.2017.06.002, DOI 10.14416/J.APPSCI.2017.06.002]
[5]  
Cortez Paulo., A Data Mining Approach to predict Forest Fires using Meteorological Data
[6]  
Enders C. K., 2010, Applied Missing Data Analysis
[7]  
Han J, 2012, MOR KAUF D, P1
[8]  
Hengpraprohm K., 2018, INFORM TECHNOL J, V14, P9
[9]   Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction [J].
Hong, Shangzhi ;
Lynn, Henry S. .
BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
[10]   Comparison of Performance of Data Imputation Methods for Numeric Dataset [J].
Jadhav, Anil ;
Pramod, Dhanya ;
Ramanathan, Krishnan .
APPLIED ARTIFICIAL INTELLIGENCE, 2019, 33 (10) :913-933