Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis

被引:4
作者
Thongsri, Thidarat [1 ]
Samart, Klairung [1 ]
机构
[1] Prince Songkla Univ, Fac Sci, Div Computat Sci, Stat & Applicat Res Unit, Hat Yai, Thailand
关键词
missing data; imputation method; composite method; multiple linear regression; HOT DECK IMPUTATION;
D O I
10.1134/S1995080222140323
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Missing data is a common issue in many domains of study. If this issue is disregarded, the erroneous conclusion may be reached. This study's objective is to develop and compared the efficiency of eight imputation methods: hot deck imputation (HD), k-nearest neighbors imputation (KNN), stochastic regression, imputation (SR), predictive mean matching imputation (PMM), random forest imputation (RF), stochastic regression random forest with equivalent weight imputation (SREW), k-nearest random forest with equivalent weight imputation (KREW), and k-nearest stochastic regression and random forest with equivalent weight imputation (KSREW). In this study, the simulation was run using sample sizes of 30, 60, 100, and 150, and missing percentages of 10%, 20%, 30%, and 40%. The average mean square error (AMSE) was used to compare efficiency. The results reveal that the proposed composite approaches outperformed the single ones, particularly a three-component method called KSREW. Increasing the number of components to a four-component method, on the other hand, has no effect on imputation performance.
引用
收藏
页码:3390 / 3399
页数:10
相关论文
共 50 条
[41]   A novel model to optimize multiple imputation algorithm for missing data using evolution methods [J].
Mohammed, Yasser Salaheldin ;
Abdelkader, Hatem ;
Plawiak, Pawel ;
Hammad, Mohamed .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 76
[42]   Regression-based imputation of explanatory discrete missing data [J].
Hernandez-Herrera, Gilma ;
Navarro, Albert ;
Morina, David .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (09) :4363-4379
[43]   A Comparison of Hot Deck Imputation and Substitution Methods in The Estimation of Missing Data [J].
Yesilova, Abdullah ;
Kaya, Yilmaz ;
Almali, M. Nuri .
GAZI UNIVERSITY JOURNAL OF SCIENCE, 2011, 24 (01) :69-75
[44]   The case for the use of multiple imputation missing data methods in stochastic frontier analysis with illustration using English local highway data [J].
Stead, Alexander D. ;
Wheat, Phill .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 280 (01) :59-77
[45]   New imputation methods for missing data using quantiles [J].
Munoz, J. F. ;
Rueda, M. .
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2009, 232 (02) :305-317
[46]   Evaluating Imputation Methods for Missing Data in a MCI Dataset [J].
Gomez-Valades Batanero, Alba ;
Rincon Zamorano, Mariano ;
Martinez Tomas, Rafael ;
Guerrero Martin, Juan .
ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 :446-454
[47]   Missing Network Data A Comparison of Different Imputation Methods [J].
Krause, Robert W. ;
Huisman, Mark ;
Steglich, Christian ;
Snijders, Tom A. B. .
2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, :159-163
[48]   Multiple Imputation for Missing Data Using Genetic Programming [J].
Cao Truong Tran ;
Zhang, Mengjie ;
Andreae, Peter .
GECCO'15: PROCEEDINGS OF THE 2015 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2015, :583-590
[49]   Application of Multiple Imputation Method for Missing Data Estimation [J].
Ser, Gazel .
GAZI UNIVERSITY JOURNAL OF SCIENCE, 2012, 25 (04) :869-873
[50]   Multiple Imputation for Missing Data in Life Cycle Inventory [J].
Liu, Yu ;
Gong, Xianzheng ;
Wang, ZhiHong ;
Liu, Wei ;
Nie, Zuoren .
MATERIALS RESEARCH, PTS 1 AND 2, 2009, 610-613 :21-27