Selective Rademacher penalization and reduced error pruning of decision trees

被引:0
作者
Kääriäinen, M
Malinen, T
Elomaa, T
机构
[1] Univ Helsinki, Dept Comp Sci, FI-00014 Helsinki, Finland
[2] Tampere Univ Technol, Inst Software Syst, FI-33101 Tampere, Finland
关键词
generalization error analysis; data-dependent generalization error bounds; Rademacher complexity; decision trees; reduced error pruning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It appears to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase. This study constitutes the first application of Rademacher penalization to hypothesis classes that have practical significance. We present two variations of the approach, one in which the hypothesis class consists of all prunings of the initial tree and another in which only the prunings that are accurate on growing data are taken into account. Moreover, we generalize the error-bounding approach from binary classification to multi-class situations. Our empirical experiments indicate that the proposed new bounds outperform distribution-independent bounds for decision tree prunings and provide non-trivial error estimates on real-world data sets.
引用
收藏
页码:1107 / 1126
页数:20
相关论文
共 38 条
  • [1] An efficient algorithm for optimal pruning of decision trees
    Almuallim, H
    [J]. ARTIFICIAL INTELLIGENCE, 1996, 83 (02) : 347 - 362
  • [2] [Anonymous], 1982, ESTIMATION DEPENDENC
  • [3] AUER P, 1995, P 12 INT C MACH LEAR, P21
  • [4] BARTLETT P, 2002, LECT NOTES COMPUT SC, V2375, P44
  • [5] Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
  • [6] BARTLETT PL, 2004, IN PRESS ANN STAT
  • [7] Blake C.L., 1998, UCI repository of machine learning databases
  • [8] OCCAM RAZOR
    BLUMER, A
    EHRENFEUCHT, A
    HAUSSLER, D
    WARMUTH, MK
    [J]. INFORMATION PROCESSING LETTERS, 1987, 24 (06) : 377 - 380
  • [9] LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION
    BLUMER, A
    EHRENFEUCHT, A
    HAUSSLER, D
    WARMUTH, MK
    [J]. JOURNAL OF THE ACM, 1989, 36 (04) : 929 - 965
  • [10] BOHANEC M, 1994, MACH LEARN, V15, P223, DOI 10.1023/A:1022685808937