Evaluating association rules and decision trees to predict multiple target attributes

被引:23
作者
Ordonez, Carlos [1 ]
Zhao, Kai [1 ]
机构
[1] Univ Houston, Dept Comp Sci, Houston, TX 77204 USA
基金
美国国家科学基金会;
关键词
Association rule; decision tree; classification; search constraint; FREQUENT PATTERNS; INFECTION-CONTROL; HEART-DISEASE; SURVEILLANCE; DISCOVERY; DATABASES;
D O I
10.3233/IDA-2010-0462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Association rules and decision trees represent two well-known data mining techniques to find predictive rules. In this work, we present a detailed comparison between constrained association rules and decision trees to predict multiple target attributes. We identify important differences between both techniques for such goal. We conduct an extensive experimental evaluation on a real medical data set to mine rules predicting disease on multiple heart arteries. The antecedent of association rules contains medical measurements and patient risk factors, whereas the consequent refers to the degree of disease on one artery or multiple arteries. Predictive rules found by constrained association rule mining are more abundant and have higher reliability than predictive rules induced by decision trees. We investigate why decision trees miss certain rules, why they tend to have lower confidence and the possibility of improving them to match constrained association rules. Based on our experimental results, we show association rules, compared to decision trees, tend to have higher confidence, they involve larger subsets of the data set, they work better with user-defined binning and they are easier to interpret.
引用
收藏
页码:173 / 192
页数:20
相关论文
共 32 条
  • [1] Becquet C, 2002, GENOME BIOL, V3
  • [2] BRAAL L, 1996, P VISUALIZATION BIOM, P253
  • [3] Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
  • [4] Association rules and data mining in hospital infection control and public health surveillance
    Brossette, SE
    Sprague, AP
    Hardin, JM
    Waites, KB
    Jones, WT
    Moser, SA
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1998, 5 (04) : 373 - 381
  • [5] Brossette SE, 2000, METHOD INFORM MED, V39, P303
  • [6] DBC: a condensed representation of frequent patterns for efficient mining
    Bykowski, A
    Rigotti, C
    [J]. INFORMATION SYSTEMS, 2003, 28 (08) : 949 - 977
  • [7] Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan
    Chen, TJ
    Chou, LF
    Hwang, SJ
    [J]. CLINICAL THERAPEUTICS, 2003, 25 (09) : 2453 - 2463
  • [8] COOKE D, 1999, J NUCL MED, V40
  • [9] Mining gene expression databases for association rules
    Creighton, C
    Hanash, S
    [J]. BIOINFORMATICS, 2003, 19 (01) : 79 - 86
  • [10] Mining association rules with improved semantics in medical databases
    Delgado, M
    Sánchez, D
    Martín-Bautista, MJ
    Vila, MA
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2001, 21 (1-3) : 241 - 245