Forest-ORE: Mining an optimal rule ensemble to interpret random forest models

被引:0
作者
Haddouchi, Maissae [1 ]
Berrado, Abdelaziz [1 ]
机构
[1] Mohammed V Univ Rabat, Ecole Mohammadia Ingn EMI, AMIPS Res Team, Rabat, Morocco
关键词
Interpretability; Optimization; Tree ensemble; Random forest; Rule ensemble; CLASSIFICATION; SET; NUMBER;
D O I
10.1016/j.engappai.2024.109997
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Random Forest (RF) is well-known as an efficient ensemble learning method with strong predictive performance. However, it is often regarded as a "black box"due to its reliance on hundreds of deep decision trees. This lack of interpretability can be a real drawback for the acceptance of RF models in several real- world applications, especially those affecting individuals' lives. In this work, we present Forest-ORE, a method that makes RF interpretable via an optimized rule ensemble (ORE) for local and global interpretation. Unlike other rule-based approaches aimed at interpreting the RF model, this method simultaneously considers several parameters that influence the choice of an interpretable rule ensemble. Existing methods often prioritize predictive performance over interpretability coverage and do not account for existing overlaps or interactions between rules. Forest-ORE uses a mixed-integer optimization program to build an ORE that considers the trade-off between predictive performance, interpretability coverage, and model size (ensemble size, rule length, and overlap). In addition to producing an ORE competitive with RF in predictive performance, this method enriches the ORE through other rules that afford complementary information. This framework is illustrated through an example, and its robustness is evaluated across 36 benchmark datasets. A comparative analysis with well-known methods shows that Forest-ORE achieves an excellent trade-off between predictive performance, interpretability coverage, and model size.
引用
收藏
页数:14
相关论文
共 70 条
  • [1] Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm
    Adnan, Md Nasim
    Islam, Md Zahidul
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 110 : 86 - 97
  • [2] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [3] Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
  • [4] Radial Sets: Interactive Visual Analysis of Large Overlapping Sets
    Alsallakh, Bilal
    Aigner, Wolfgang
    Miksch, Silvia
    Hauser, Helwig
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) : 2496 - 2505
  • [5] Interpretable regularized class association rules algorithm for classification in a categorical data space
    Azmi, Mohamed
    Runger, George C.
    Berrado, Abdelaziz
    [J]. INFORMATION SCIENCES, 2019, 483 : 313 - 331
  • [6] Detecting group differences: Mining contrast sets
    Bay, SD
    Pazzani, MJ
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) : 213 - 246
  • [7] Constraint-based rule mining in large, dense databases
    Bayardo, RJ
    Agrawal, R
    Gunopulos, D
    [J]. 15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 188 - 197
  • [8] Beckett C., 2018, All Graduate Plan B and Other Reports 1335
  • [9] SIRUS: Stable and Interpretable RUle Set for classification
    Benard, Clement
    Biau, Gerard
    Da Veiga, Sebastien
    Scornet, Erwan
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 427 - 505
  • [10] Benavoli A, 2016, J MACH LEARN RES, V17