Selective Rademacher penalization and reduced error pruning of decision trees

被引:0
作者
Kääriäinen, M
Malinen, T
Elomaa, T
机构
[1] Univ Helsinki, Dept Comp Sci, FI-00014 Helsinki, Finland
[2] Tampere Univ Technol, Inst Software Syst, FI-33101 Tampere, Finland
关键词
generalization error analysis; data-dependent generalization error bounds; Rademacher complexity; decision trees; reduced error pruning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It appears to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase. This study constitutes the first application of Rademacher penalization to hypothesis classes that have practical significance. We present two variations of the approach, one in which the hypothesis class consists of all prunings of the initial tree and another in which only the prunings that are accurate on growing data are taken into account. Moreover, we generalize the error-bounding approach from binary classification to multi-class situations. Our empirical experiments indicate that the proposed new bounds outperform distribution-independent bounds for decision tree prunings and provide non-trivial error estimates on real-world data sets.
引用
收藏
页码:1107 / 1126
页数:20
相关论文
共 38 条
[11]  
Breiman L., 1998, CLASSIFICATION REGRE
[12]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
[13]  
Elomaa T, 2002, EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, P140
[14]   An analysis of reduced error pruning [J].
Elomaa, T ;
Kääriäinen, M .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 :163-187
[15]   A comparative analysis of methods for pruning decision trees [J].
Esposito, F ;
Malerba, D ;
Semeraro, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (05) :476-491
[16]   On the difficulty of designing good classifiers [J].
Grigni, M ;
Mirelli, V ;
Papadimitriou, CH .
SIAM JOURNAL ON COMPUTING, 2000, 30 (01) :318-323
[17]   Predicting nearly as well as the best pruning of a decision tree [J].
Helmbold, DP ;
Schapire, RE .
MACHINE LEARNING, 1997, 27 (01) :51-68
[18]  
Kääriäinen M, 2003, LECT NOTES ARTIF INT, V2837, P193
[19]  
KEARNS M, 1998, P 15 INT C MACH LEAR, P269
[20]   Rademacher penalties and structural risk minimization [J].
Koltchinskii, V .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2001, 47 (05) :1902-1914