Improving deep learning performance with missing values via deletion and compensation

被引:0
作者
Adrián Sánchez-Morales
José-Luis Sancho-Gómez
Juan-Antonio Martínez-García
Aníbal R. Figueiras-Vidal
机构
[1] Universidad Politécnica de Cartagena,Departamento de Tecnologías de la Información y las Comunicaciones
[2] Universidad Carlos III de Madrid,Departamento de Teoría de la Señal y Comunicaciones
来源
Neural Computing and Applications | 2020年 / 32卷
关键词
Missing values; Imputation; Classification; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Missing values in a dataset is one of the most common difficulties in real applications. Many different techniques based on machine learning have been proposed in the literature to face this problem. In this work, the great representation capability of the stacked denoising auto-encoders is used to obtain a new method of imputating missing values based on two ideas: deletion and compensation. This method improves imputation performance by artificially deleting values in the input features and using them as targets in the training process. Nevertheless, although the deletion of samples is demonstrated to be really efficient, it may cause an imbalance between the distributions of the training and the test sets. In order to solve this issue, a compensation mechanism is proposed based on a slight modification of the error function to be optimized. Experiments over several datasets show that the deletion and compensation not only involve improvements in imputation but also in classification in comparison with other classical techniques.
引用
收藏
页码:13233 / 13244
页数:11
相关论文
共 68 条
  • [1] Sharpe PK(1995)Dealing with missing values in neural network-based diagnostic systems Neural Comput Appl 3 73-77
  • [2] Solly RJ(2010)Pattern classification with missing data: a review Neural Comput Appl 19 263-282
  • [3] García-Laencina PJ(2005)A hybrid neural network system for pattern classification tasks with missing features IEEE Trans Pattern Anal Mach Intell 27 648-653
  • [4] Sancho-Gómez JL(2012)Fuzzy min–max neural networks for categorical data: application to missing data imputation Neural Comput Appl 21 1349-1362
  • [5] Figueiras-Vidal AR(2007)Imputation through finite Gaussian mixture models Comput Stat Data Anal 51 5305-5316
  • [6] Lim CP(2009)K nearest neighbours with mutual information for simultaneous classification and missing data imputation Neurocomputing 72 1483-1493
  • [7] Leong JH(2003)An analysis of four missing data treatment methods for supervised learning Appl Artif Intell 17 519-533
  • [8] Kuan MM(2001)Missing value estimation methods for DNA microarrays Bioinformatics 17 520-525
  • [9] Del Castillo PR(2002)Self-organising map for data imputation and correction in surveys Neural Comput Appl 10 300-310
  • [10] Cardeosa J(2007)Handling of incomplete data sets using ICA and SOM in data mining Neural Comput Appl 16 167-172