Deep learning for missing value imputation of continuous data and the effect of data discretization

被引:63
作者
Lin, Wei-Chao [1 ,2 ]
Tsai, Chih-Fong [3 ]
Zhong, Jia Rong [3 ]
机构
[1] Chang Gung Univ, Dept Informat Management, Taoyuan, Taiwan
[2] Chang Gung Mem Hosp Linkou, Dept Thorac Surg, Taoyuan, Taiwan
[3] Natl Cent Univ, Dept Informat Management, Taoyuan, Taiwan
关键词
Data science; Machine learning; Deep learning; Missing value imputation; Data discretization; CLASSIFICATION; MACHINES;
D O I
10.1016/j.knosys.2021.108079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Often real-world datasets are incomplete and contain some missing attribute values. Furthermore, many data mining and machine learning techniques cannot directly handle incomplete datasets. Missing value imputation is the major solution for constructing a learning model to estimate specific values to replace the missing ones. Deep learning techniques have been employed for missing value imputation and demonstrated their superiority over many other well-known imputation methods. However, very few studies have attempted to assess the imputation performance of deep learning techniques for tabular or structured data with continuous values. Moreover, the effect on the imputation results when the continuous data need to be discretized has never been examined. In this paper, two supervised deep neural networks, i.e., multilayer perceptron (MLP) and deep belief networks (DBN), are compared for missing value imputation. Moreover, two differently ordered combinations of data discretization and imputation steps are examined. The results show that MLP and DBN significantly outperform the baseline imputation methods based on the mean, KNN, CART, and SVM, with DBN performing the best. On the other hand, when considering the discretization of continuous data, the order in which the two steps are combined is not the most important, but rather, the chosen imputation algorithm. That is, the final performance is much better when using DBN for imputation, regardless of whether discretization is performed in the first or second step, than the other imputation methods.(c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Missing value imputation for the analysis of incomplete traffic accident data
    Deb, Rupam
    Liew, Alan Wee -Chung
    [J]. INFORMATION SCIENCES, 2016, 339 : 274 - 289
  • [22] Fuzzy neuron modeling of incomplete data for missing value imputation
    Zhang, Zheng
    Yan, Xiaoming
    Zhang, Liyong
    Lai, Xiaochen
    Lu, Wei
    [J]. INFORMATION SCIENCES, 2024, 659
  • [23] Missing value estimation using clustering and deep learning within multiple imputation framework
    Samad, Manar D.
    Abrar, Sakib
    Diawara, Norou
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 249
  • [24] Deep Learning Methods for Omics Data Imputation
    Huang, Lei
    Song, Meng
    Shen, Hui
    Hong, Huixiao
    Gong, Ping
    Deng, Hong-Wen
    Zhang, Chaoyang
    [J]. BIOLOGY-BASEL, 2023, 12 (10):
  • [25] Imputation of Missing Healthcare Data
    Chowdhury, Mohaimanul Hoque
    Islam, Muhammad Kamrul
    Khan, Shahidul Islam
    [J]. 2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
  • [26] Takagi-Sugeno Modeling of Incomplete Data for Missing Value Imputation With the Use of Alternate Learning
    Lai, Xiaochen
    Zhang, Liyong
    Liu, Xin
    [J]. IEEE ACCESS, 2020, 8 (08): : 83633 - 83644
  • [27] A functional data approach to missing value imputation and outlier detection for traffic flow data
    Chiou, Jeng-Min
    Zhang, Yi-Chen
    Chen, Wan-Hui
    Chang, Chiung-Wen
    [J]. TRANSPORTMETRICA B-TRANSPORT DYNAMICS, 2014, 2 (02) : 106 - 129
  • [28] Are deep learning models superior for missing data imputation in surveys? Evidence from an empirical comparison
    Wang, Zhenhua
    Akande, Olanrewaju
    Poulos, Jason
    Li, Fan
    [J]. SURVEY METHODOLOGY, 2022, 48 (02) : 375 - 399
  • [29] MICROARRAY MISSING DATA IMPUTATION USING REGRESSION
    Bayrak, Tuncay
    Ogul, Hasan
    [J]. 2017 13TH IASTED INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING (BIOMED), 2017, : 68 - 73
  • [30] Missing value imputation for gene expression data: computational techniques to recover missing data from available information
    Liew, Alan Wee-Chung
    Law, Ngai-Fong
    Yan, Hong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2011, 12 (05) : 498 - 513