Mining Optimal Decision Trees from Itemset Lattices

被引:0
|
作者
Nijssen, Siegfried [1 ]
Fromont, Elisa [1 ]
机构
[1] Katholieke Univ Leuven, Leuven, Belgium
来源
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年
关键词
Decision trees; Frequent itemsets; Formal concepts; Constraint-based mining;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present DL8, an exact algorithm for finding a decision tree that optimizes a ranking function under size, depth, accuracy and leaf constraints. Because the discovery of optimal trees has high theoretical complexity, until now few efforts have been made to compute such trees for real-world datasets. An exact algorithm is of both scientific and practical interest. Front a scientific point of view, it can be used as a gold standard to evaluate the performance of heuristic constraint-based decision tree learners and to gain new insight in traditional decision tree learners. From the application point of view, it can be used to discover trees that cannot be found by heuristic decision tree learners. The key idea behind our algorithm is that there is a relation between constraints on decision trees and constraints on itemsets. We show that optimal decision trees can be extracted from lattices of itemsets in linear time. We give several strategies to efficiently build these lattices. Experiments show that under the same constraints, DL8 obtains better results than C4.5, which confirms that exhaustive search does not always imply overfitting. The results also show that DL8 is a useful and interesting tool to learn decision trees under constraints.
引用
收藏
页码:530 / 539
页数:10
相关论文
共 50 条
  • [21] Optimal Interpretable Clustering Using Oblique Decision Trees
    Gabidolla, Magzhan
    Carreira-Perpinan, Miguel A.
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 400 - 410
  • [22] Decision Trees for Mining Data Streams Based on the Gaussian Approximation
    Rutkowski, Leszek
    Jaworski, Maciej
    Pietruczuk, Lena
    Duda, Piotr
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (01) : 108 - 119
  • [23] Learning Optimal Decision Trees Under Memory Constraints
    Aglin, Gael
    Nijssen, Siegfried
    Schaus, Pierre
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT V, 2023, 13717 : 393 - 409
  • [24] Hybrid Splitting Criterion in Decision Trees for Data Stream Mining
    Jaworski, Maciej
    Rutkowski, Leszek
    Pawlak, Miroslaw
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, (ICAISC 2016), PT II, 2016, 9693 : 60 - 72
  • [25] Globally optimal fuzzy decision trees for classification and regression
    Suárez, A
    Lutsko, JF
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1999, 21 (12) : 1297 - 1311
  • [26] Minimax-optimal classification with dyadic decision trees
    Scott, C
    Nowak, RD
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (04) : 1335 - 1353
  • [27] An Improved Version of the Frequent Itemset Mining Algorithm
    Butincu, Cristian Nicolae
    Craus, Mitica
    2015 14TH ROEDUNET INTERNATIONAL CONFERENCE - NETWORKING IN EDUCATION AND RESEARCH (ROEDUNET NER), 2015, : 184 - 189
  • [28] Approximate Parallel High Utility Itemset Mining
    Chen, Yan
    An, Aijun
    BIG DATA RESEARCH, 2016, 6 : 26 - 42
  • [29] High-utility and diverse itemset mining
    Verma, Amit
    Dawar, Siddharth
    Kumar, Raman
    Navathe, Shamkant
    Goyal, Vikram
    APPLIED INTELLIGENCE, 2021, 51 (07) : 4649 - 4663
  • [30] Frequent Itemset Mining Techniques - A Technical Review
    Chaure, Tushar M.
    Singh, Kavita R.
    2016 WORLD CONFERENCE ON FUTURISTIC TRENDS IN RESEARCH AND INNOVATION FOR SOCIAL WELFARE (STARTUP CONCLAVE), 2016,