Forest-ORE: Mining an optimal rule ensemble to interpret random forest models

被引：0

作者：

Haddouchi, Maissae ^{[1
]}

Berrado, Abdelaziz ^{[1
]}

机构：

[1] Mohammed V Univ Rabat, Ecole Mohammadia Ingn EMI, AMIPS Res Team, Rabat, Morocco

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2025年 / 143卷

关键词：

Interpretability; Optimization; Tree ensemble; Random forest; Rule ensemble; CLASSIFICATION; SET; NUMBER;

D O I：

10.1016/j.engappai.2024.109997

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Random Forest (RF) is well-known as an efficient ensemble learning method with strong predictive performance. However, it is often regarded as a "black box"due to its reliance on hundreds of deep decision trees. This lack of interpretability can be a real drawback for the acceptance of RF models in several real- world applications, especially those affecting individuals' lives. In this work, we present Forest-ORE, a method that makes RF interpretable via an optimized rule ensemble (ORE) for local and global interpretation. Unlike other rule-based approaches aimed at interpreting the RF model, this method simultaneously considers several parameters that influence the choice of an interpretable rule ensemble. Existing methods often prioritize predictive performance over interpretability coverage and do not account for existing overlaps or interactions between rules. Forest-ORE uses a mixed-integer optimization program to build an ORE that considers the trade-off between predictive performance, interpretability coverage, and model size (ensemble size, rule length, and overlap). In addition to producing an ORE competitive with RF in predictive performance, this method enriches the ORE through other rules that afford complementary information. This framework is illustrated through an example, and its robustness is evaluated across 36 benchmark datasets. A comparative analysis with well-known methods shows that Forest-ORE achieves an excellent trade-off between predictive performance, interpretability coverage, and model size.

引用

页数：14

共 70 条

[1] Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm
Adnan, Md Nasim
Islam, Md Zahidul
[J]. KNOWLEDGE-BASED SYSTEMS, 2016, 110 : 86 - 97
[2] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3] Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[4] Radial Sets: Interactive Visual Analysis of Large Overlapping Sets
Alsallakh, Bilal
Aigner, Wolfgang
Miksch, Silvia
Hauser, Helwig
[J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) : 2496 - 2505
[5] Interpretable regularized class association rules algorithm for classification in a categorical data space
Azmi, Mohamed
Runger, George C.
Berrado, Abdelaziz
[J]. INFORMATION SCIENCES, 2019, 483 : 313 - 331
[6] Detecting group differences: Mining contrast sets
Bay, SD
Pazzani, MJ
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) : 213 - 246
[7] Constraint-based rule mining in large, dense databases
Bayardo, RJ
Agrawal, R
Gunopulos, D
[J]. 15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 188 - 197
[8] Beckett C., 2018, All Graduate Plan B and Other Reports 1335
[9] SIRUS: Stable and Interpretable RUle Set for classification
Benard, Clement
Biau, Gerard
Da Veiga, Sebastien
Scornet, Erwan
[J]. ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 427 - 505
[10] Benavoli A, 2016, J MACH LEARN RES, V17

← 1 2 3 4 5 6 7 →