Ranking-based instance selection for pattern classification

被引:25
作者
Cavalcanti, George D. C. [1 ]
Soares, Rodolfo J. O. [1 ]
机构
[1] Univ Fed Pernambuco UFPE, Ctr Informat CIn, Av Jornalista Anibal Fernandes S-N,Cidade Univ, BR-50740560 Recife, PE, Brazil
关键词
Instance selection; Ranking; Instance-based learning; k-nearest neighbor; Classification; PROTOTYPE SELECTION; DYNAMIC CLASSIFIER; REDUCTION; ALGORITHMS;
D O I
10.1016/j.eswa.2020.113269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In instance-based learning algorithms, the need to store a large number of examples as the training set results in several drawbacks related to large memory requirements, oversensitivity to noise, and slow execution speed. Instance selection techniques can improve the performance of these algorithms by selecting the best instances from the original data set, removing, for example, redundant information and noisy points. The relationship between an instance and the other patterns in the training set plays an important role and can impact its misclassification by learning algorithms. Such a relationship can be represented as a value that measures how difficult such instance is regarding classification purposes. Based on that, we introduce a novel instance selection algorithm called Ranking-based Instance Selection (RIS) that attributes a score per instance that depends on its relationship with all other instances in the training set. In this sense, instances with higher scores form safe regions (neighborhood of samples with relatively homogeneous class labels) in the feature space, and instances with lower scores form an indecision region (borderline samples of different classes). This information is further used in a selection process to remove instances from both safe and indecision regions that are considered irrelevant to represent their clusters in the feature space. In contrast to previous algorithms, the proposal combines a raking procedure with a selection process aiming to find a promising tradeoff between accuracy and reduction rate. Experiments are conducted on twenty-four real-world classification problems and show the effectiveness of the RIS algorithm when compared against other instance selection algorithms in the literature. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 31 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]   Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning [J].
Arnaiz-Gonzalez, Alvar ;
Diez-Pastor, Jose-Francisco ;
Rodriguez, Juan J. ;
Garcia-Osorio, Cesar .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 109 :114-130
[4]   Instance selection for regression by discretization [J].
Arnaiz-Gonzalez, Alvar ;
Diez-Pastor, Jose F. ;
Rodriguez, Juan J. ;
Ignacio Garcia-Osorio, Cesar .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 54 :340-350
[5]  
Benavoli A, 2016, J MACH LEARN RES, V17
[6]   PROTOTYPE SELECTION FOR INTERPRETABLE CLASSIFICATION [J].
Bien, Jacob ;
Tibshirani, Robert .
ANNALS OF APPLIED STATISTICS, 2011, 5 (04) :2403-2424
[7]  
Bridle J. S., 1990, Neurocomputing, Algorithms, Architectures and Applications. Proceedings of the NATO Advanced Research Workshop, P227
[8]   ATISA: Adaptive Threshold-based Instance Selection Algorithm [J].
Cavalcanti, George D. C. ;
Ren, Tsang Ing ;
Pereira, Cesar Lima .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (17) :6894-6900
[9]   Evolutionary feature and instance selection for traffic sign recognition [J].
Chen, Zong-Yao ;
Lin, Wei-Chao ;
Ke, Shih-Wen ;
Tsai, Chih-Fong .
COMPUTERS IN INDUSTRY, 2015, 74 :201-211
[10]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+