Comparison of Instance Selection and Construction Methods with Various Classifiers

被引:15
作者
Blachnik, Marcin [1 ]
Kordos, Miroslaw [2 ]
机构
[1] Silesian Tech Univ, Fac Mat Engn, Dept Ind Informat, Akad 2A, PL-44100 Gliwice, Poland
[2] Univ Bielsko Biala, Dept Comp Sci, Willowa 2, PL-43309 Bielsko Biala, Poland
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 11期
关键词
machine learning; classification; preprocessing; instance selection; NEAREST-NEIGHBOR; PROTOTYPE SELECTION; RULES; ALGORITHM;
D O I
10.3390/app10113933
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Instance selection and construction methods were originally designed to improve the performance of the k-nearest neighbors classifier by increasing its speed and improving the classification accuracy. These goals were achieved by eliminating redundant and noisy samples, thus reducing the size of the training set. In this paper, the performance of instance selection methods is investigated in terms of classification accuracy and reduction of training set size. The classification accuracy of the following classifiers is evaluated: decision trees, random forest, Naive Bayes, linear model, support vector machine and k-nearest neighbors. The obtained results indicate that for the most of the classifiers compressing the training set affects prediction performance and only a small group of instance selection methods can be recommended as a general purpose preprocessing step. These are learning vector quantization based algorithms, along with the Drop2 and Drop3. Other methods are less efficient or provide low compression ratio.
引用
收藏
页数:19
相关论文
共 43 条
[1]   Stimuli-Magnitude-Adaptive Sample Selection for Data-Driven Haptic Modeling [J].
Abdulali, Arsen ;
Hassan, Waseem ;
Jeon, Seokhee .
ENTROPY, 2016, 18 (06)
[2]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[3]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[4]   MR-DIS: democratic instance selection for big data by MapReduce [J].
Arnaiz-González Á. ;
González-Rogel A. ;
Díez-Pastor J.-F. ;
López-Nozal C. .
Progress in Artificial Intelligence, 2017, 6 (03) :211-219
[5]   Instance selection of linear complexity for big data [J].
Arnaiz-Gonzalez, Alvar ;
Diez-Pastor, Jose-Francisco ;
Rodriguez, Juan J. ;
Garcia-Osorio, Cesar .
KNOWLEDGE-BASED SYSTEMS, 2016, 107 :83-95
[6]   Instance selection for regression by discretization [J].
Arnaiz-Gonzalez, Alvar ;
Diez-Pastor, Jose F. ;
Rodriguez, Juan J. ;
Ignacio Garcia-Osorio, Cesar .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 54 :340-350
[7]   Decision boundary preserving prototype selection for nearest neighbor classification [J].
Barandela, R ;
Ferri, FJ ;
Sánchez, JS .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2005, 19 (06) :787-806
[8]  
Blachnik M, 2006, LECT NOTES COMPUT SC, V4029, P573
[9]  
Blachnik M, 2006, LECT NOTES COMPUT SC, V4234, P1028
[10]   Instance Selection for Classifier Performance Estimation in Meta Learning [J].
Blachnik, Marcin .
ENTROPY, 2017, 19 (11)