A review on missing values for main challenges and methods

被引:19
作者
Ren, Lijuan [1 ]
Wang, Tao [2 ]
Seklouli, Aicha Sekhari [3 ]
Zhang, Haiqing [1 ]
Bouras, Abdelaziz [4 ]
机构
[1] Chengdu Univ Informat Technol, Sch Software Engn, Chengdu 610225, Peoples R China
[2] Univ Lyon 1, Univ Lyon 2, Univ Jean Monnet St Etienne, Univ Lyon,DISP UR4570,INSA Lyon, F-42300 Roanne, France
[3] Univ Lyon 1, Univ Lyon 2, Univ Lyon, DISP UR4570,INSA Lyon, F-69676 Bron, France
[4] Qatar Univ, Coll Engn, CSE, Doha 2713, Qatar
关键词
Missing values; Imputation; Deletion; Missing mechanism; Machine learning; MULTIPLE IMPUTATION; TIME-SERIES; DECISION TREES; LINEAR-MODELS; DATA SETS; IMPACT; CLASSIFICATION; REGRESSION; ILLUSTRATION; NONRESPONSE;
D O I
10.1016/j.is.2023.102268
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several recent reviews summarize common missing value analysis methods. However, none of them provide a systematic and in-depth summary of the analytical challenges and solutions for dealing with missing values. For the purpose of guiding the handling of missing values, this review aims to consolidate current developments in novel missing-value research methodologies. In particular, we comprehensively investigated cutting-edge missing value solutions and methodically studied the main challenges associated with missing values analysis (missing mechanisms, missing patterns, and missing rates). Furthermore, we reviewed 63 publications that compare different strategies for deleting and imputing missing values. Then we investigated data characteristics, highlighted three main problems when analyzing missing values, and analyzed the performance of missing value solutions in these studied papers. Moreover, we conducted comprehensive experiments on 9 public datasets using typical missing value processing methods and provided a simple guided decision tree for handling missing values. Finally, we described current Research hotspots and open challenges, which give potential research topics.
引用
收藏
页数:23
相关论文
共 105 条
  • [1] Adhikari Deepak, 2021, Microprocessors and Microsystems, DOI [10.1016/j.micpro.2020.103636, DOI 10.1016/J.MICPRO.2020.103636]
  • [2] Dealing with missing values in large-scale studies: microarray data imputation and beyond
    Aittokallio, Tero
    [J]. BRIEFINGS IN BIOINFORMATICS, 2010, 11 (02) : 253 - 264
  • [3] Missing Values Imputation based on Fuzzy C-Means Algorithm for Classification of Chronic Obstructive Pulmonary Disease (COPD)
    Aristiawati, Kiki
    Siswantining, Titin
    Sarwinda, Devvi
    Soemartojo, Saskya Mary
    [J]. PROCEEDINGS OF THE 8TH SEAMS-UGM INTERNATIONAL CONFERENCE ON MATHEMATICS AND ITS APPLICATIONS 2019: DEEPENING MATHEMATICAL CONCEPTS FOR WIDER APPLICATION THROUGH MULTIDISCIPLINARY RESEARCH AND INDUSTRIES COLLABORATIONS, 2019, 2192
  • [4] Bayesian modeling of missing data in clinical research
    Austin, PC
    Escobar, MD
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 49 (03) : 821 - 836
  • [5] Analyzing weight loss intervention studies with missing data: Which methods should be used?
    Batterham, Marijka J.
    Tapsell, Linda C.
    Charlton, Karen E.
    [J]. NUTRITION, 2013, 29 (7-8) : 1024 - 1029
  • [6] LSimpute: accurate estimation of missing values in microarray data with least squares methods
    Bo, TH
    Dysvik, J
    Jonassen, I
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (03) : e34
  • [7] Bono Christine, 2007, Res Social Adm Pharm, V3, P1, DOI 10.1016/j.sapharm.2006.04.001
  • [8] Multiple imputation was an efficient method for harmonizing the Mini-Mental State Examination with missing item-level data
    Burns, Richard A.
    Butterworth, Peter
    Kiely, Kim M.
    Bielak, Allison A. M.
    Luszcz, Mary A.
    Mitchell, Paul
    Christensen, Helen
    Von Sanden, Chwee
    Anstey, Kaarin J.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (07) : 787 - 793
  • [9] Using link-preserving imputation for logistic partially linear models with missing covariates
    Chen, Qixuan
    Paik, Myunghee Cho
    Kim, Minjin
    Wang, Cuiling
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 101 : 174 - 185
  • [10] A simulation study using EFA and CFA programs based the impact of missing data on test dimensionality
    Chen, Shin-Feng
    Wang, Shuyi
    Chen, Chen-Yuan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (04) : 4026 - 4031