Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data

被引：0

作者：

Thongsri, Thidarat ^{[1
,2
]}

Samart, Klairung ^{[1
,2
]}

机构：

[1] Prince Songkla Univ, Fac Sci, Div Computat Sci, Hat Yai, Thailand

[2] Prince Songkla Univ, Fac Sci, Stat & Applicat Res Unit, Hat Yai, Thailand

来源：

INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE | 2022年 / 17卷 / 01期

关键词：

Composite method; imputation method; missing data; missing at random; multiple linear regression; HOT DECK IMPUTATION;

D O I：

暂无

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Missing data is a common occurrence in the data collection process. If this problem is ignored it can lead to unreliable conclusions. Our research objective is to develop a method for handling missing data in multiple linear regression at random on both response and independent variables and to compare its efficiency with existing techniques. For handling missing data, five so-called techniques were employed; namely, listwise deletion (LD), hot deck imputation (HD), predictive mean matching imputation (PMM), stochastic regression imputation (SR), and random forest imputation (RF). We compare them with the following proposed composite imputation method: stochastic regression random forest with equivalent weight (SREW). SREW is derived from a combination of stochastic regression and random forest methods weighted by the equivalent weighted method. In this study, the Monte Carlo simulations were done under the sample sizes of 30, 60, 90, 120 and 150 along with the missing percentages of 10%, 20%, 30% and 40% and the standard deviations of error of 1, 3 and 5. The criterion to compare the efficiency is the average mean square error (AMSE). The results show that the SREW is most efficient in all situations whereas the hot deck gives the highest AMSE in almost all cases, especially when the missing percentage is high.

引用

页码：51 / 62

页数：12

共 22 条

[1] Estimating extremely large amounts of missing precipitation data [J].