Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data

被引:0
作者
Thongsri, Thidarat [1 ,2 ]
Samart, Klairung [1 ,2 ]
机构
[1] Prince Songkla Univ, Fac Sci, Div Computat Sci, Hat Yai, Thailand
[2] Prince Songkla Univ, Fac Sci, Stat & Applicat Res Unit, Hat Yai, Thailand
关键词
Composite method; imputation method; missing data; missing at random; multiple linear regression; HOT DECK IMPUTATION;
D O I
暂无
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Missing data is a common occurrence in the data collection process. If this problem is ignored it can lead to unreliable conclusions. Our research objective is to develop a method for handling missing data in multiple linear regression at random on both response and independent variables and to compare its efficiency with existing techniques. For handling missing data, five so-called techniques were employed; namely, listwise deletion (LD), hot deck imputation (HD), predictive mean matching imputation (PMM), stochastic regression imputation (SR), and random forest imputation (RF). We compare them with the following proposed composite imputation method: stochastic regression random forest with equivalent weight (SREW). SREW is derived from a combination of stochastic regression and random forest methods weighted by the equivalent weighted method. In this study, the Monte Carlo simulations were done under the sample sizes of 30, 60, 90, 120 and 150 along with the missing percentages of 10%, 20%, 30% and 40% and the standard deviations of error of 1, 3 and 5. The criterion to compare the efficiency is the average mean square error (AMSE). The results show that the SREW is most efficient in all situations whereas the hot deck gives the highest AMSE in almost all cases, especially when the missing percentage is high.
引用
收藏
页码:51 / 62
页数:12
相关论文
共 22 条
[1]   Estimating extremely large amounts of missing precipitation data [J].
Aguilera, Hector ;
Guardiola-Albert, Carolina ;
Serrano-Hidalgo, Carmen .
JOURNAL OF HYDROINFORMATICS, 2020, 22 (03) :578-592
[2]   A Review of Hot Deck Imputation for Survey Non-response [J].
Andridge, Rebecca R. ;
Little, Roderick J. A. .
INTERNATIONAL STATISTICAL REVIEW, 2010, 78 (01) :40-64
[3]  
Breiman L., 2001, Machine Learning, V45, P5
[4]   Imputing missing repeated measures data:: how should we proceed? [J].
Elliott, P ;
Hawthorne, G .
AUSTRALIAN AND NEW ZEALAND JOURNAL OF PSYCHIATRY, 2005, 39 (07) :575-582
[5]  
Enders C. K., 2022, Applied missing data analysis
[6]  
Han J, 2012, MOR KAUF D, P1
[7]   Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction [J].
Hong, Shangzhi ;
Lynn, Henry S. .
BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
[8]   Comparison of Performance of Data Imputation Methods for Numeric Dataset [J].
Jadhav, Anil ;
Pramod, Dhanya ;
Ramanathan, Krishnan .
APPLIED ARTIFICIAL INTELLIGENCE, 2019, 33 (10) :913-933
[9]   When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts [J].
Jakobsen, Janus Christian ;
Gluud, Christian ;
Wetterslev, Jorn ;
Winkel, Per .
BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
[10]  
Little R., 2002, Statistical analysis with missing data, V2nd ed., DOI DOI 10.1002/9781119013563