Mining Optimal Decision Trees from Itemset Lattices

被引:0
|
作者
Nijssen, Siegfried [1 ]
Fromont, Elisa [1 ]
机构
[1] Katholieke Univ Leuven, Leuven, Belgium
来源
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年
关键词
Decision trees; Frequent itemsets; Formal concepts; Constraint-based mining;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present DL8, an exact algorithm for finding a decision tree that optimizes a ranking function under size, depth, accuracy and leaf constraints. Because the discovery of optimal trees has high theoretical complexity, until now few efforts have been made to compute such trees for real-world datasets. An exact algorithm is of both scientific and practical interest. Front a scientific point of view, it can be used as a gold standard to evaluate the performance of heuristic constraint-based decision tree learners and to gain new insight in traditional decision tree learners. From the application point of view, it can be used to discover trees that cannot be found by heuristic decision tree learners. The key idea behind our algorithm is that there is a relation between constraints on decision trees and constraints on itemsets. We show that optimal decision trees can be extracted from lattices of itemsets in linear time. We give several strategies to efficiently build these lattices. Experiments show that under the same constraints, DL8 obtains better results than C4.5, which confirms that exhaustive search does not always imply overfitting. The results also show that DL8 is a useful and interesting tool to learn decision trees under constraints.
引用
收藏
页码:530 / 539
页数:10
相关论文
共 50 条
  • [31] Searching for high-support itemsets in itemset trees
    Li, Yu
    Kubat, Miroslav
    INTELLIGENT DATA ANALYSIS, 2006, 10 (02) : 105 - 120
  • [32] Decision Trees for Mining Data Streams Based on the McDiarmid's Bound
    Rutkowski, Leszek
    Pietruczuk, Lena
    Duda, Piotr
    Jaworski, Maciej
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (06) : 1272 - 1279
  • [33] Optimal decision trees for categorical data via integer programming
    Oktay Günlük
    Jayant Kalagnanam
    Minhan Li
    Matt Menickelly
    Katya Scheinberg
    Journal of Global Optimization, 2021, 81 : 233 - 260
  • [34] Approximation Algorithms for Optimal Decision Trees and Adaptive TSP Problems
    Gupta, Anupam
    Nagarajan, Viswanath
    Ravi, R.
    MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (03) : 876 - 896
  • [35] Optimal decision trees for categorical data via integer programming
    Gunluk, Oktay
    Kalagnanam, Jayant
    Li, Minhan
    Menickelly, Matt
    Scheinberg, Katya
    JOURNAL OF GLOBAL OPTIMIZATION, 2021, 81 (01) : 233 - 260
  • [36] MurTree: Optimal Decision Trees via Dynamic Programming and Search
    Demirovic, Emir
    Lukina, Anna
    Hebrard, Emmanuel
    Chan, Jeffrey
    Bailey, James
    Leckie, Christopher
    Ramamohanarao, Kotagiri
    Stuckey, Peter J.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [37] A Survey on Closed Frequent Itemset Mining on Data Streams
    Bai, Pavitra . S.
    Kumar, Ravi . G. . K.
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 542 - 547
  • [38] Learning from crowds with decision trees
    Wenjun Yang
    Chaoqun Li
    Liangxiao Jiang
    Knowledge and Information Systems, 2022, 64 : 2123 - 2140
  • [39] Learning from crowds with decision trees
    Yang, Wenjun
    Li, Chaoqun
    Jiang, Liangxiao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (08) : 2123 - 2140
  • [40] Data Mining Classification Experiments with Decision Trees over the Forest Covertype Database
    Badulescu, Laviniu Aurelian
    2017 21ST INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2017, : 236 - 241