Instance selection for regression by discretization

被引:30
作者
Arnaiz-Gonzalez, Alvar [1 ]
Diez-Pastor, Jose F. [1 ]
Rodriguez, Juan J. [1 ]
Ignacio Garcia-Osorio, Cesar [1 ]
机构
[1] Univ Burgos, Civil Engn, Escuela Politecn Super, Avda Cantabria S-N, Burgos 09006, Province Of Bur, Spain
关键词
Instance selection; Regression; Mutual information; Noise filtering; Class noise; NEAREST-NEIGHBOR; PROTOTYPE SELECTION; MUTUAL INFORMATION; REDUCTION TECHNIQUES; GENETIC ALGORITHMS; VARIABLE SELECTION; NEURAL-NETWORKS; FUZZY-SYSTEMS; PREDICTION; NOISE;
D O I
10.1016/j.eswa.2015.12.046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An important step in building expert and intelligent systems is to obtain the knowledge that they will use. This knowledge can be obtained from experts or, nowadays more often, from machine learning processes applied to large volumes of data. However, for some of these learning processes, if the volume of data is large, the knowledge extraction phase is very slow (or even impossible). Moreover, often the origin of the data sets used for learning are measure processes in which the collected data can contain errors, so the presence of noise in the data is inevitable. It is in such environments where an initial step of noise filtering and reduction of data set size plays a fundamental role. For both tasks, instance selection emerges as a possible solution that has proved to be useful in various fields. In this paper we focus mainly on instance selection for noise removal. In addition, in contrast to most of the existing methods, which applied instance selection to classification tasks (discrete prediction), the proposed approach is used to obtain instance selection methods for regression tasks (prediction of continuous values). The different nature of the value to predict poses an extra difficulty that explains the low number of articles on the subject of instance selection for regression. More specifically the idea used in this article to adapt to regression problems "classic" instance-selection algorithms for classification is as simple as the discretization of the numerical output variable. In the experimentation, the proposed method is compared with much more sophisticated methods, specifically designed for regression, and shows to be very competitive. The main contributions of the paper include: (i) a simple way to adapt to regression instance selection algorithms for classification, (ii) the use of this approach to adapt a popular noise filter called ENN (edited nearest neighbor), and (iii) the comparison of this noise filter against two other specifically designed for regression, showing to be very competitive despite its simplicity. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:340 / 350
页数:11
相关论文
共 61 条
[1]   INSTANCE-BASED LEARNING ALGORITHMS [J].
AHA, DW ;
KIBLER, D ;
ALBERT, MK .
MACHINE LEARNING, 1991, 6 (01) :37-66
[2]   Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach [J].
Ahn, Hyunchul ;
Kim, Kyoung-Jae .
APPLIED SOFT COMPUTING, 2009, 9 (02) :599-607
[3]   Genetic learning of accurate and compact fuzzy rule based systems based on the 2-tuples linguistic representation [J].
Alcala, Rafael ;
Alcala-Fdez, Jesus ;
Herrera, Francisco ;
Otero, Jose .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2007, 44 (01) :45-64
[4]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[5]  
[Anonymous], 2002, DATA MIN KNOWL DISC, DOI DOI 10.1145/775047.775062
[6]  
[Anonymous], 2005, MORGAN KAUFMANN SERI
[7]   Genetic Training Instance Selection in Multiobjective Evolutionary Fuzzy Systems: A Coevolutionary Approach [J].
Antonelli, Michela ;
Ducange, Pietro ;
Marcelloni, Francesco .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2012, 20 (02) :276-290
[8]   Decision boundary preserving prototype selection for nearest neighbor classification [J].
Barandela, R ;
Ferri, FJ ;
Sánchez, JS .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2005, 19 (06) :787-806
[9]   Feature selection using Joint Mutual Information Maximisation [J].
Bennasar, Mohamed ;
Hicks, Yulia ;
Setchi, Rossitza .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) :8520-8532
[10]  
Blachnik M, 2014, LECT NOTES ARTIF INT, V8468, P40, DOI 10.1007/978-3-319-07176-3_4