Cluster-oriented instance selection for classification problems

被引:18
作者
Saha, Soumitra [1 ]
Sarker, Partho Sarathi [1 ]
Al Saud, Alam [2 ]
Shatabda, Swakkhar [2 ]
Newton, M. A. Hakim [3 ,4 ]
机构
[1] Univ Global Village, Dept Comp Sci & Engn, C&B Rd, Barishal 8200, Bangladesh
[2] United Int Univ, Dept Comp Sci & Engn, Plot 2,United City,Madani Ave, Dhaka 1212, Bangladesh
[3] Univ Newcastle, Sch Informat & Phys Sci, Univ Dr, Callaghan, NSW 2308, Australia
[4] Griffith Univ, Inst Integrated & Intelligent Syst, 170 Kessels Rd, Nathan, Qld 4111, Australia
关键词
Instance selection; Data reduction; Classification problems; REDUCTION;
D O I
10.1016/j.ins.2022.04.036
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More training instances could lead to better classification accuracy. However, accuracy could also degrade if more training instances mean further noises and outliers. Additional training instances arguably need additional computational resources in future data mining operations. Instance selection algorithms identify subsets of training instances that could desirably increase accuracy or at least do not decrease accuracy significantly. There exist many instance selection algorithms, but no single algorithm, in general, dominates the others. Moreover, existing instance selection algorithms do not allow direct controlling of the instance selection rate. In this paper, we present a simple and generic cluster-oriented instance selection algorithm for classification problems. Our proposed algorithm runs an unsupervised K Means Clustering algorithm on the training instances and with a given selection rate, selects instances from the centers and the borders of the clusters. On 24 benchmark classification problems, when very similar percentages of instances are selected by various instance selection algorithms, K Nearest Neighbours classifiers achieve more than 2%-3% better accuracy when using instances selected by our proposed method than when using those selected by other state-of-the-art generic instance selection algorithms.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:143 / 158
页数:16
相关论文
共 50 条
[1]   INSTANCE-BASED LEARNING ALGORITHMS [J].
AHA, DW ;
KIBLER, D ;
ALBERT, MK .
MACHINE LEARNING, 1991, 6 (01) :37-66
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]  
[Anonymous], 1982, Pattern recognition: A statistical approach
[4]   Data-based analysis of Laplacian Eigenmaps for manifold reduction in supervised Liquid State classifiers [J].
Arena, Paolo ;
Patane, Luca ;
Spinosa, Angelo Giuseppe .
INFORMATION SCIENCES, 2019, 478 :28-39
[5]   A new fast prototype selection method based on clustering [J].
Arturo Olvera-Lopez, J. ;
Ariel Carrasco-Ochoa, J. ;
Francisco Martinez-Trinidad, J. .
PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (02) :131-141
[6]   Efficient and decision boundary aware instance selection for support vector machines [J].
Aslani, Mohammad ;
Seipel, Stefan .
INFORMATION SCIENCES, 2021, 577 :579-598
[7]  
Brighton H, 2001, SPRING INT SER ENG C, V608, P77
[8]  
Caragea Doina, 2004, Int J Hybrid Intell Syst, V1, P80
[9]   Ranking-based instance selection for pattern classification [J].
Cavalcanti, George D. C. ;
Soares, Rodolfo J. O. .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150
[10]   ATISA: Adaptive Threshold-based Instance Selection Algorithm [J].
Cavalcanti, George D. C. ;
Ren, Tsang Ing ;
Pereira, Cesar Lima .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (17) :6894-6900