Out-of-bag estimation of the optimal sample size in bagging

被引:93
作者
Martinez-Munoz, Gonzalo [1 ]
Suarez, Alberto [1 ]
机构
[1] Univ Autonoma Madrid, Escuela Politecn Super 11, E-28049 Madrid, Spain
关键词
Bagging; Subagging; Bootstrap sampling; Subsampling; Optimal sampling ratio; Ensembles of classifiers; Decision trees; CLASSIFIERS; CLASSIFICATION;
D O I
10.1016/j.patcog.2009.05.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set m(wor) = n. Without-replacement methods typically use half samples m(wr) = n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of the classification performance of the ensemble. We propose to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio. Ensembles of classifiers trained on independent samples whose size is such that the out-of-bag error of the ensemble is as low as possible generally improve the performance of standard bagging and can be efficiently built. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:143 / 152
页数:10
相关论文
共 28 条