Systematic Review of Using Machine Learning in Imputing Missing Values

被引:47
作者
Alabadla, Mustafa [1 ]
Sidi, Fatimah [1 ]
Ishak, Iskandar [1 ]
Ibrahim, Hamidah [1 ]
Affendey, Lilly Suriani [1 ]
Ani, Zafienas Che [1 ]
Jabar, Marzanah A. [2 ]
Bukar, Umar Ali [2 ,3 ]
Devaraj, Navin Kumar [4 ]
Muda, Ahmad Sobri [5 ]
Tharek, Anas [5 ]
Omar, Noritah [6 ]
Jaya, M. Izham Mohd [7 ]
机构
[1] Univ Putra Malaysia UPM, Fac Comp Sci & Informat Technol, Dept Comp Sci, Serdang 43400, Selangor, Malaysia
[2] Univ Putra Malaysia UPM, Fac Comp Sci & Informat Technol, Dept Software Engn & Informat Syst, Serdang 43400, Selangor, Malaysia
[3] Taraba State Univ, Dept Math Sci, Comp Sci Unit, Jalingo 00234, Nigeria
[4] Univ Putra Malaysia UPM, Fac Med & Hlth Sci, Dept Family Med, Serdang 43400, Selangor, Malaysia
[5] Univ Putra Malaysia UPM, Fac Med & Hlth Sci, Dept Radiol, Serdang 43400, Selangor, Malaysia
[6] Univ Putra Malaysia UPM, Fac Modern Languages & Commun, Dept English, Serdang 43400, Selangor, Malaysia
[7] Univ Malaysia Pahang UMP, Fac Comp, Dept Software Engn, Pekan 26600, Pahang, Malaysia
关键词
Systematics; Bibliographies; Data integrity; Data mining; Computer science; STEM; Market research; Systematic literature review; data imputation; data mining; missingness; data preprocessing; data quality; VALUE IMPUTATION; CLASSIFICATION; NETWORK;
D O I
10.1109/ACCESS.2022.3160841
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Missing data are a universal data quality problem in many domains, leading to misleading analysis and inaccurate decisions. Much research has been done to investigate the different mechanisms of missing data and the proper techniques in handling various data types. In the last decade, machine learning has been utilized to replace conventional methods to address the problem of missing values more efficiently. By studying and analyzing recently proposed methods using machine learning approaches, vital adoptions in accuracy, performance, and time consumed can be highlighted. This study aimed to help data analysts and researchers address the limitations of machine learning imputation methods by conducting a systematic literature review to provide a comprehensive overview of using such methods to impute missing values. Novel proposed machine learning approaches used for data imputation are analyzed and summarized to assist researchers in selecting a proper machine learning method based on several factors and settings. The review was performed on research studies published between 2016 and 2021 on adopting machine learning to impute missing values, focusing on their strengths and limitations. A total of 684 research articles from various scientific databases were analyzed using search engines, and 94 of them were selected as primary studies. Finally, several recommendations were given to guide future researchers in applying machine learning to impute missing values.
引用
收藏
页码:44483 / 44502
页数:20
相关论文
共 111 条
[1]   A Novel Approach for Dealing with Missing Values in Machine Learning Datasets with Discrete Values [J].
Abu-Soud, Saleh M. .
2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, :118-122
[2]  
Al-Helali B, 2021, SOFT COMPUT, V25, P5993, DOI [10.1007/s00500-021-05590-y, 10.4102/sajr.v25i1.2146]
[3]   Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation [J].
Alamoodi, A. H. ;
Zaidan, B. B. ;
Zaidan, A. . A. . ;
Albahri, O. S. ;
Chen, Juliana ;
Chyad, M. A. ;
Garfan, Salem ;
Aleesa, A. M. .
CHAOS SOLITONS & FRACTALS, 2021, 151
[4]  
[Anonymous], 2010, EVALUTION AND ASSESM, DOI DOI 10.14236/EWIC/EASE2010.17
[5]   Advances in Machine Learning Modeling Reviewing Hybrid and Ensemble Methods [J].
Ardabili, Sina ;
Mosavi, Amir ;
Varkonyi-Koczy, Annamaria R. .
ENGINEERING FOR SUSTAINABLE FUTURE, 2020, 101 :215-227
[6]   Fault diagnosis of chemical processes with incomplete observations: A comparative study [J].
Askarian, M. ;
Escudero, G. ;
Graells, M. ;
Zarghami, R. ;
Jalali-Farahani, F. ;
Mostoufi, N. .
COMPUTERS & CHEMICAL ENGINEERING, 2016, 84 :104-116
[7]  
Barnard, 2015, MULTIPLE IMPUTATION, V16
[8]   BEST: a decision tree algorithm that handles missing values [J].
Beaulac, Cedric ;
Rosenthal, Jeffrey S. .
COMPUTATIONAL STATISTICS, 2020, 35 (03) :1001-1026
[9]  
Beaulieu-Jones BK, 2017, BIOCOMPUT-PAC SYM, P207, DOI 10.1142/9789813207813_0021
[10]  
Boquet G, 2019, INT CONF ACOUST SPEE, P2882, DOI [10.1109/ICASSP.2019.8683011, 10.1109/icassp.2019.8683011]