Cluster-based instance selection for machine classification

被引:0
作者
Ireneusz Czarnowski
机构
[1] Gdynia Maritime University,Department of Information Systems
来源
Knowledge and Information Systems | 2012年 / 30卷
关键词
Machine learning; Data mining; Instance selection; Multi-agent system;
D O I
暂无
中图分类号
学科分类号
摘要
Instance selection in the supervised machine learning, often referred to as the data reduction, aims at deciding which instances from the training set should be retained for further use during the learning process. Instance selection can result in increased capabilities and generalization properties of the learning model, shorter time of the learning process, or it can help in scaling up to large data sources. The paper proposes a cluster-based instance selection approach with the learning process executed by the team of agents and discusses its four variants. The basic assumption is that instance selection is carried out after the training data have been grouped into clusters. To validate the proposed approach and to investigate the influence of the clustering method used on the quality of the classification, the computational experiment has been carried out.
引用
收藏
页码:113 / 133
页数:20
相关论文
共 57 条
[1]  
Aha DW(1999)Instance-based learning algorithms Mach Learn 6 37-66
[2]  
Kibler D(2000)Nearest prototype classifier design: an experimental study Int J Intell Syst 16 1445-1473
[3]  
Albert MK(2004)On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining Appl Soft Comput 6 323-332
[4]  
Bezdek JC(2003)A framework for learning from distributed data using sufficient statistics and its application to learning decision trees Int J Hybrid Intell Syst 1 80-89
[5]  
Kuncheva LI(1974)Finding prototypes for nearest neighbor classifier IEEE Trans Comput 23 1179-1184
[6]  
Cano JR(1997)Feature selection for classification Intell Data Anal 1 131-156
[7]  
Herrera F(2003)Fast accurate fuzzy clustering through data reduction IEEE Trans Fuzzy Syst 11 262-270
[8]  
Lozano M(1937)The use of ranks to avoid the assumption of normality implicit in the analysis of variance J Am Stat Assoc 32 675-701
[9]  
Caragea D(1968)The condensed nearest neighbour rule IEEE Trans Inf Theory 14 515-516
[10]  
Silvescu A(1999)Social learning algorithm as a tool for solving some difficult scheduling problems Found Comput Decis Sci 24 51-66