Deep learning for missing value imputation of continuous data and the effect of data discretization

被引：62

作者：

Lin, Wei-Chao ^{[1
,2
]}

Tsai, Chih-Fong ^{[3
]}

Zhong, Jia Rong ^{[3
]}

机构：

[1] Chang Gung Univ, Dept Informat Management, Taoyuan, Taiwan

[2] Chang Gung Mem Hosp Linkou, Dept Thorac Surg, Taoyuan, Taiwan

[3] Natl Cent Univ, Dept Informat Management, Taoyuan, Taiwan

来源：

KNOWLEDGE-BASED SYSTEMS | 2022年 / 239卷

关键词：

Data science; Machine learning; Deep learning; Missing value imputation; Data discretization; CLASSIFICATION; MACHINES;

D O I：

10.1016/j.knosys.2021.108079

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Often real-world datasets are incomplete and contain some missing attribute values. Furthermore, many data mining and machine learning techniques cannot directly handle incomplete datasets. Missing value imputation is the major solution for constructing a learning model to estimate specific values to replace the missing ones. Deep learning techniques have been employed for missing value imputation and demonstrated their superiority over many other well-known imputation methods. However, very few studies have attempted to assess the imputation performance of deep learning techniques for tabular or structured data with continuous values. Moreover, the effect on the imputation results when the continuous data need to be discretized has never been examined. In this paper, two supervised deep neural networks, i.e., multilayer perceptron (MLP) and deep belief networks (DBN), are compared for missing value imputation. Moreover, two differently ordered combinations of data discretization and imputation steps are examined. The results show that MLP and DBN significantly outperform the baseline imputation methods based on the mean, KNN, CART, and SVM, with DBN performing the best. On the other hand, when considering the discretization of continuous data, the order in which the two steps are combined is not the most important, but rather, the chosen imputation algorithm. That is, the final performance is much better when using DBN for imputation, regardless of whether discretization is performed in the first or second step, than the other imputation methods.(c) 2021 Elsevier B.V. All rights reserved.

引用

页数：9

共 50 条

[1] Data discretization impact on deep learning for missing value imputation of continuous data
Kumar, Talluri Sunil
Neelakantan, P.
Reddy, G. Suresh
Gangappa, Malige
Rajasekar, M.
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2025, 23 (01)
[2] Combining data discretization and missing value imputation for incomplete medical datasets
Huang, Min-Wei
Tsai, Chih-Fong
Tsui, Shu-Ching
Lin, Wei-Chao
PLOS ONE, 2023, 18 (11):
[3] Missing-Value Imputation of Continuous Missing Based on Deep Imputation Network Using Correlations among Multiple IoT Data Streams in a Smart Space
Lee, Minseok
An, Jihoon
Lee, Younghee
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 289 - 298
[4] Missing Value Imputation: With Application to Handwriting Data
Xu, Zhen
Srihari, Sargur N.
DOCUMENT RECOGNITION AND RETRIEVAL XXII, 2015, 9402
[5] Hybrid prediction model with missing value imputation for medical data
Purwar, Archana
Singh, Sandeep Kumar
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) : 5621 - 5631
[6] Missing value imputation in food composition data with denoising autoencoders
Gjorshoska, Ivana
Eftimov, Tome
Trajanov, Dimitar
JOURNAL OF FOOD COMPOSITION AND ANALYSIS, 2022, 112
[7] Generative adversarial learning for missing data imputation
Xinyang Wang
Hongyu Chen
Jiayu Zhang
Jicong Fan
Neural Computing and Applications, 2025, 37 (3) : 1403 - 1416
[8] Transformers deep learning models for missing data imputation: an application of the ReMasker model on a psychometric scale
Casella, Monica
Milano, Nicola
Dolce, Pasquale
Marocco, Davide
FRONTIERS IN PSYCHOLOGY, 2024, 15
[9] The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning
Lall, Ranjit
Robinson, Thomas
POLITICAL ANALYSIS, 2022, 30 (02) : 179 - 196
[10] Long-term missing value imputation for time series data using deep neural networks
Park, Jangho
Muller, Juliane
Arora, Bhavna
Faybishenko, Boris
Pastorello, Gilberto
Varadharajan, Charuleka
Sahu, Reetik
Agarwal, Deborah
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (12) : 9071 - 9091

← 1 2 3 4 5 →