A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

被引:0
作者
Baligh Al-Helali
Qi Chen
Bing Xue
Mengjie Zhang
机构
[1] Victoria University of Wellington,School of Engineering and Computer Science
来源
Soft Computing | 2021年 / 25卷
关键词
Symbolic regression; Genetic programming; Incomplete data; KNN; Imputation;
D O I
暂无
中图分类号
学科分类号
摘要
Incompleteness is one of the problematic data quality challenges in real-world machine learning tasks. A large number of studies have been conducted for addressing this challenge. However, most of the existing studies focus on the classification task and only a limited number of studies for symbolic regression with missing values exist. In this work, a new imputation method for symbolic regression with incomplete data is proposed. The method aims to improve both the effectiveness and efficiency of imputing missing values for symbolic regression. This method is based on genetic programming (GP) and weighted K-nearest neighbors (KNN). It constructs GP-based models using other available features to predict the missing values of incomplete features. The instances used for constructing such models are selected using weighted KNN. The experimental results on real-world data sets show that the proposed method outperforms a number of state-of-the-art methods with respect to the imputation accuracy, the symbolic regression performance, and the imputation time.
引用
收藏
页码:5993 / 6012
页数:19
相关论文
共 46 条
  • [1] Davidson JW(2003)Symbolic and numerical regression: experiments and applications Inf Sci 150 95-117
  • [2] Savic DA(2006)A gentle introduction to imputation of missing values J Clin Epidemiol 59 1087-1091
  • [3] Walters GA(2012)Deap: evolutionary algorithms made easy J Mach Learn Res 13 2171-2175
  • [4] Donders ART(2011)Missing data imputation in multivariate data by evolutionary algorithms Comput Hum Behav 27 1468-1474
  • [5] Van Der Heijden GJ(2010)Pattern classification with missing data: a review Neural Comput Appl 19 263-282
  • [6] Stijnen T(2015)Data imputation via evolutionary computation, clustering and a neural network Neurocomputing 156 134-142
  • [7] Moons KG(1994)Genetic programming as a means for programming computers by natural selection Stat Comput 4 87-112
  • [8] Fortin FA(2015)Multi-objective genetic algorithm for missing data imputation Pattern Recogn Lett 68 126-131
  • [9] Rainville FMD(2012)A genetic algorithm based approach for imputing missing discrete attribute values in databases WSEAS Trans Inf Sci Appl 9 169-178
  • [10] Gardner MA(1976)Inference and missing data Biometrika 63 581-592