Selecting informative rules with parallel genetic algorithm in classification problem

被引:15
作者
Sarkar, Bikash Kanti [1 ]
Sana, Shib Sankar [2 ]
Chaudhuri, Kripasindhu [3 ]
机构
[1] BIT, Dept Informat Technol, Ranchi 835215, Jharkhand, India
[2] Bhangar Mahavidyalaya CU, Dept Math, Bhangar 743502, WB, India
[3] Jadavpur Univ, Dept Math, Kolkata 32, India
关键词
Classification; Accuracy; C4.5; Parallel genetic algorithm;
D O I
10.1016/j.amc.2011.08.065
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The classification system is very important for making decision and it has been attracted much attention of many researchers. Usually, the traditional classifiers are either domain specific or produce unsatisfactory results over classification problems with larger size and imbalanced data. Hence, genetic algorithms (GA) are recently being combined with traditional classifiers to find useful knowledge for making decision. Although, the main concerns of such GA-based system are the coverage of less search space and increase of computational cost with the growth of population. In this paper, a rule-based knowledge discovery model, combining C4.5 (a Decision Tree based rule inductive algorithm) and a new parallel genetic algorithm based on the idea of massive parallelism, is introduced. The prime goal of the model is to produce a compact set of informative rules from any kind of classification problem. More specifically, the proposed model receives a base method C4.5 to generate rules which are then refined by our proposed parallel GA. The strength of the developed system has been compared with pure C4.5 as well as the hybrid system (C4.5 + sequential genetic algorithm) on six real world benchmark data sets collected from UCI (University of California at Irvine) machine learning repository. Experiments on data sets validate the effectiveness of the new model. The presented results especially indicate that the model is powerful for volumetric data set. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:3247 / 3264
页数:18
相关论文
共 31 条
  • [1] [Anonymous], 1963, DISTRIBUTION FREE MU
  • [2] [Anonymous], 2004, COMBINING PATTERN CL, DOI DOI 10.1002/0471660264
  • [3] [Anonymous], P 8 INT C IT CIT 200
  • [4] [Anonymous], 1997, MACHINE LEARNING, MCGRAW-HILL SCIENCE/ENGINEERING/MATH
  • [5] Accuracy-based Learning Classifier Systems:: Models, analysis and applications to classification tasks
    Bernadó-Mansilla, E
    Garrell-Guiu, JM
    [J]. EVOLUTIONARY COMPUTATION, 2003, 11 (03) : 209 - 238
  • [6] BIANCHINI R, 1993, 436 U ROCH COMP SCI
  • [7] Blake C., 1999, Repository of Machine Learning
  • [8] Cano A, 2010, LECT NOTES ARTIF INT, V6077, P17, DOI 10.1007/978-3-642-13803-4_3
  • [9] CANTUPAZ E, 1997, 97003 ILLGAL U ILL
  • [10] A hybrid model by clustering and evolving fuzzy rules for sales decision supports in printed circuit board industry
    Chang, Pei-Chann
    Liu, Chen-Hao
    Wang, Yen-Wen
    [J]. DECISION SUPPORT SYSTEMS, 2006, 42 (03) : 1254 - 1269