RANSAC-GP: Dealing with Outliers in Symbolic Regression with Genetic Programming

被引:8
作者
Lopez, Uriel [1 ]
Trujillo, Leonardo [1 ]
Martinez, Yuliana [1 ]
Legrand, Pierrick [2 ,3 ,4 ]
Naredo, Enrique [5 ]
Silva, Sara [6 ,7 ]
机构
[1] Inst Tecnol Tijuana, Posgrad Ciencias Ingn, Unidad Otay, Blvd Ind,Ave ITR Tijuana S-N, Tijuana 22500, BC, Mexico
[2] Univ Bordeaux, 3Ter Pl Victoire, F-33076 Bordeaux, France
[3] CNRS, UMR 5251, IMB, 351 Cours Liberat, F-33405 Talence, France
[4] INRIA Bordeaux Sud Ouest, 200 Rue Vieille Tour, F-33405 Talence, France
[5] Ctr Invest Geog & Geomat CentroGeo, Lab Nacl Geointeligencia GeoINT, Aguascalientes, Mexico
[6] Univ Lisbon, Fac Ciencias, Dept Informat, BioISI Biosyst & Integrat Sci Inst, P-1749016 Lisbon, Portugal
[7] Univ Coimbra, Dept Informat Engn, CISUC, Coimbra, Portugal
来源
GENETIC PROGRAMMING, EUROGP 2017 | 2017年 / 10196卷
关键词
Genetic programming; RANSAC; Robust regression; Outliers; TRIMMED SQUARES REGRESSION;
D O I
10.1007/978-3-319-55696-3_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Genetic programming (GP) has been shown to be a powerful tool for automatic modeling and program induction. It is often used to solve difficult symbolic regression tasks, with many examples in real-world domains. However, the robustness of GP-based approaches has not been substantially studied. In particular, the present work deals with the issue of outliers, data in the training set that represent severe errors in the measuring process. In general, a datum is considered an outlier when it sharply deviates from the true behavior of the system of interest. GP practitioners know that such data points usually bias the search and produce inaccurate models. Therefore, this work presents a hybrid methodology based on the RAndom SAmpling Consensus (RANSAC) algorithm and GP, which we call RANSAC-GP. RANSAC is an approach to deal with outliers in parameter estimation problems, widely used in computer vision and related fields. On the other hand, this work presents the first application of RANSAC to symbolic regression with GP, with impressive results. The proposed algorithm is able to deal with extreme amounts of contamination in the training set, evolving highly accurate models even when the amount of outliers reaches 90%.
引用
收藏
页码:114 / 130
页数:17
相关论文
共 20 条
[1]   SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS [J].
Alfons, Andreas ;
Croux, Christophe ;
Gelper, Sarah .
ANNALS OF APPLIED STATISTICS, 2013, 7 (01) :226-248
[2]  
[Anonymous], 2009, Ransac for dummies
[3]  
Derpanis K.G., 2010, Image Rochester NY, V4, P2
[4]   RANDOM SAMPLE CONSENSUS - A PARADIGM FOR MODEL-FITTING WITH APPLICATIONS TO IMAGE-ANALYSIS AND AUTOMATED CARTOGRAPHY [J].
FISCHLER, MA ;
BOLLES, RC .
COMMUNICATIONS OF THE ACM, 1981, 24 (06) :381-395
[5]  
Fortin FA, 2012, J MACH LEARN RES, V13, P2171
[6]   Least trimmed squares regression, least median squares regression, and mathematical programming [J].
Giloni, A ;
Padberg, M .
MATHEMATICAL AND COMPUTER MODELLING, 2002, 35 (9-10) :1043-1060
[7]  
Goncalves Ivo, 2013, Genetic Programming. 16th European Conference (EuroGP 2013). Proceedings, P73, DOI 10.1007/978-3-642-37207-0_7
[8]  
Hast A., 2013, J WSCG
[9]   SYMBOLIC REGRESSION VIA GENETIC PROGRAMMING AS A DISCOVERY ENGINE: INSIGHTS ON OUTLIERS AND PROTOTYPES [J].
Kotanchek, Mark E. ;
Vladislavleva, Ekaterina Y. ;
Smits, Guido F. .
GENETIC PROGRAMMING THEORY AND PRACTICE VII, 2010, :55-+
[10]   ε-Lexicase Selection for Regression [J].
La Cava, William ;
Spector, Lee ;
Danai, Kourosh .
GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, :741-748