Deep learning for missing value imputation of continuous data and the effect of data discretization

被引:62
|
作者
Lin, Wei-Chao [1 ,2 ]
Tsai, Chih-Fong [3 ]
Zhong, Jia Rong [3 ]
机构
[1] Chang Gung Univ, Dept Informat Management, Taoyuan, Taiwan
[2] Chang Gung Mem Hosp Linkou, Dept Thorac Surg, Taoyuan, Taiwan
[3] Natl Cent Univ, Dept Informat Management, Taoyuan, Taiwan
关键词
Data science; Machine learning; Deep learning; Missing value imputation; Data discretization; CLASSIFICATION; MACHINES;
D O I
10.1016/j.knosys.2021.108079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Often real-world datasets are incomplete and contain some missing attribute values. Furthermore, many data mining and machine learning techniques cannot directly handle incomplete datasets. Missing value imputation is the major solution for constructing a learning model to estimate specific values to replace the missing ones. Deep learning techniques have been employed for missing value imputation and demonstrated their superiority over many other well-known imputation methods. However, very few studies have attempted to assess the imputation performance of deep learning techniques for tabular or structured data with continuous values. Moreover, the effect on the imputation results when the continuous data need to be discretized has never been examined. In this paper, two supervised deep neural networks, i.e., multilayer perceptron (MLP) and deep belief networks (DBN), are compared for missing value imputation. Moreover, two differently ordered combinations of data discretization and imputation steps are examined. The results show that MLP and DBN significantly outperform the baseline imputation methods based on the mean, KNN, CART, and SVM, with DBN performing the best. On the other hand, when considering the discretization of continuous data, the order in which the two steps are combined is not the most important, but rather, the chosen imputation algorithm. That is, the final performance is much better when using DBN for imputation, regardless of whether discretization is performed in the first or second step, than the other imputation methods.(c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Data discretization impact on deep learning for missing value imputation of continuous data
    Kumar, Talluri Sunil
    Neelakantan, P.
    Reddy, G. Suresh
    Gangappa, Malige
    Rajasekar, M.
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2025, 23 (01)
  • [2] Combining data discretization and missing value imputation for incomplete medical datasets
    Huang, Min-Wei
    Tsai, Chih-Fong
    Tsui, Shu-Ching
    Lin, Wei-Chao
    PLOS ONE, 2023, 18 (11):
  • [3] Missing-Value Imputation of Continuous Missing Based on Deep Imputation Network Using Correlations among Multiple IoT Data Streams in a Smart Space
    Lee, Minseok
    An, Jihoon
    Lee, Younghee
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 289 - 298
  • [4] Missing Value Imputation: With Application to Handwriting Data
    Xu, Zhen
    Srihari, Sargur N.
    DOCUMENT RECOGNITION AND RETRIEVAL XXII, 2015, 9402
  • [5] Hybrid prediction model with missing value imputation for medical data
    Purwar, Archana
    Singh, Sandeep Kumar
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) : 5621 - 5631
  • [6] Missing value imputation in food composition data with denoising autoencoders
    Gjorshoska, Ivana
    Eftimov, Tome
    Trajanov, Dimitar
    JOURNAL OF FOOD COMPOSITION AND ANALYSIS, 2022, 112
  • [7] Generative adversarial learning for missing data imputation
    Xinyang Wang
    Hongyu Chen
    Jiayu Zhang
    Jicong Fan
    Neural Computing and Applications, 2025, 37 (3) : 1403 - 1416
  • [8] Transformers deep learning models for missing data imputation: an application of the ReMasker model on a psychometric scale
    Casella, Monica
    Milano, Nicola
    Dolce, Pasquale
    Marocco, Davide
    FRONTIERS IN PSYCHOLOGY, 2024, 15
  • [9] The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning
    Lall, Ranjit
    Robinson, Thomas
    POLITICAL ANALYSIS, 2022, 30 (02) : 179 - 196
  • [10] Long-term missing value imputation for time series data using deep neural networks
    Park, Jangho
    Muller, Juliane
    Arora, Bhavna
    Faybishenko, Boris
    Pastorello, Gilberto
    Varadharajan, Charuleka
    Sahu, Reetik
    Agarwal, Deborah
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (12) : 9071 - 9091