A genetic algorithm-based approach for building accurate decision trees

被引:30
作者
Fu, ZW
Golden, BL
Lele, S
Raghavan, S
Wasil, EA
机构
[1] Fannie Mae, Washington, DC 20016 USA
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
[3] American Univ, Kogod Sch Business, Washington, DC 20016 USA
关键词
genetic algorithm; decision tree;
D O I
10.1287/ijoc.15.1.3.15152
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In dealing with a very large data set, it might be impractical to construct a decision tree using ail of the points. Even when it is possible, this might not be the best way to utilize the data. As an alternative, subsets of the original data set can be extracted, a tree can be constructed on each subset, and then parts of individual trees can be combined in a smart way to produce an improved final set of feasible trees or a final tree. In this paper, we take trees generated by a commercial decision tree package, namely, C4.5, and allow them to crossover and mutate (using a genetic algorithm) for a number of generations in order to yield trees of better quality. We conduct a computational study of our approach using a real-life marketing data set. In this study, we divide the data set into training, scoring, and test sets, and find that our approach produces uniformly high-quality decision trees. In addition, we investigate the impact of scaling and demonstrate that our approach can be used effectively on very large data sets.
引用
收藏
页码:3 / 22
页数:20
相关论文
共 30 条
[1]  
[Anonymous], 1998, Genetic programming: an introduction
[2]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[3]  
Blake C.L., 1998, UCI repository of machine learning databases
[4]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[5]   Genetic programming for knowledge discovery in chest-pain diagnosis [J].
Bojarczuk, CC ;
Lopes, HS ;
Freitas, AA .
IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2000, 19 (04) :38-44
[6]  
Bot MCJ, 2000, LECT NOTES COMPUT SC, V1802, P247
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Breiman L, 1998, ANN STAT, V26, P801
[9]  
Cormen T. H., 1990, INTRO ALGORITHMS
[10]  
Deo N., 1974, GRAPH THEORY APPL EN, V1st