An empirical comparison of ensemble methods based on classification trees

被引:64
作者
Hamza, M [1 ]
Larocque, D [1 ]
机构
[1] HEC Montreal, Dept Management Sci, Montreal, PQ H3T 2A7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
bagging; boosting; arcing; random forest (RF); classification tree; CART; noise; linear combination of variables; splitting rule; Gini; entropy; twoing;
D O I
10.1080/00949650410001729472
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we perform an empirical comparison of the classification error of several ensemble methods based on classification trees. This comparison is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203-228.]. The methods considered are a single tree, Bagging, Boosting (Arcing) and random forests (RF). They are compared from different perspectives. More precisely, we look at the effects of noise and of allowing linear combinations in the construction of the trees, the differences between some splitting criteria and, specifically for RF, the effect of the number of variables from which to choose the best split at each given node. Moreover, we compare our results with those obtained by Lim et al. [Lim, T., Loh, W. and Shih, Y-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203-228.]. In this study, the best overall results are obtained with RF. In particular, RF are the most robust against noise. The effect of allowing linear combinations and the differences between splitting criteria are small on average, but can be substantial for some data sets.
引用
收藏
页码:629 / 643
页数:15
相关论文
共 15 条
[1]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[2]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Technical note: Some properties of splitting criteria [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (01) :41-47
[6]  
Breiman L, 1998, ANN STAT, V26, P801
[7]   An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization [J].
Dietterich, TG .
MACHINE LEARNING, 2000, 40 (02) :139-157
[8]   Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15
[9]   An adaptive version of the boost by majority algorithm [J].
Freund, Y .
MACHINE LEARNING, 2001, 43 (03) :293-318
[10]  
Freund Y, 1996, ICML