Data Imputation for Symbolic Regression with Missing Values: A Comparative Study

被引:0
作者
Al-Helali, Baligh [1 ]
Chen, Qi [1 ]
Xue, Bing [1 ]
Zhang, Mengjie [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, POB 600, Wellington 6140, New Zealand
来源
2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI) | 2020年
关键词
symbolic regression; genetic programming; incomplete data; imputation; PREDICTOR SELECTION;
D O I
10.1109/ssci47803.2020.9308216
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Symbolic regression via genetic programming is considered as a crucial machine learning tool for empirical modelling. However, in reality, it is common for real-world data sets to have some data quality problems such as noise, outliers, and missing values. Although several approaches can he adopted to deal with data incompleteness in machine learning, most studies consider the classification tasks, and only a few have considered symbolic regression with missing values. In this work, the performance of symbolic regression using genetic programming on real-world data sets that have missing values is investigated. This is done by studying how different imputation methods affect symbolic regression performance. The experiments are conducted using thirteen real-world incomplete data sets with different ratios of missing values. The experimental results show that although the performance of the imputation methods differs with the data set, CART has a better effect than others. This might be due to its ability to deal with categorical and numerical variables. Moreover, the superiority of the use of imputation methods over the commonly used deletion strategy is observed.
引用
收藏
页码:2093 / 2100
页数:8
相关论文
共 31 条
  • [11] Missing values and learning of fuzzy rules
    Berthold, MR
    Huber, KP
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 1998, 6 (02) : 171 - 178
  • [12] Brandejsky, 2013, NOSTRADAMUS 2013 PRE, P181, DOI DOI 10.1007/978-3-319-00542-3_19
  • [13] Multiple Imputation for Missing Data Using Genetic Programming
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    [J]. GECCO'15: PROCEEDINGS OF THE 2015 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2015, : 583 - 590
  • [14] Chen Q., 2020, IEEE Transactions on Cybernetics
  • [15] Chen Q., 2020, IEEE T CYBERNETICS
  • [16] Dick G, 2014, LECT NOTES COMPUT SC, V8886, P491, DOI 10.1007/978-3-319-13563-2_42
  • [17] Review: A gentle introduction to imputation of missing values
    Donders, A. Rogier T.
    van der Heijden, Geert J. M. G.
    Stijnen, Theo
    Moons, Karel G. M.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2006, 59 (10) : 1087 - 1091
  • [18] Fortin FA, 2012, J MACH LEARN RES, V13, P2171
  • [19] Pattern classification with missing data: a review
    Garcia-Laencina, Pedro J.
    Sancho-Gomez, Jose-Luis
    Figueiras-Vidal, Anibal R.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2010, 19 (02) : 263 - 282
  • [20] Ghahramani Z., 1994, ADV NEURAL INFORM PR, P120