Stable and actionable explanations of black-box models through factual and counterfactual rules

被引:15
|
作者
Guidotti, Riccardo [1 ]
Monreale, Anna [1 ]
Ruggieri, Salvatore [1 ]
Naretto, Francesca [2 ]
Turini, Franco [1 ]
Pedreschi, Dino [1 ]
Giannotti, Fosca [2 ]
机构
[1] Univ Pisa, Dept Comp Sci, Largo B Pontecorvo 3, I-56127 Pisa, PI, Italy
[2] Scuola Normale Super Pisa, Piazza Cavalieri 7, I-56126 Pisa, PI, Italy
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”; 欧洲研究理事会;
关键词
Explainable AI; Local explanations; Model-agnostic explanations; Rule-based explanations; Counterfactuals; INSTANCE SELECTION; ALGORITHMS;
D O I
10.1007/s10618-022-00878-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed the rise of accurate but obscure classification models that hide the logic of their internal decision processes. Explaining the decision taken by a black-box classifier on a specific input instance is therefore of striking interest. We propose a local rule-based model-agnostic explanation method providing stable and actionable explanations. An explanation consists of a factual logic rule, stating the reasons for the black-box decision, and a set of actionable counterfactual logic rules, proactively suggesting the changes in the instance that lead to a different outcome. Explanations are computed from a decision tree that mimics the behavior of the black-box locally to the instance to explain. The decision tree is obtained through a bagging-like approach that favors stability and fidelity: first, an ensemble of decision trees is learned from neighborhoods of the instance under investigation; then, the ensemble is merged into a single decision tree. Neighbor instances are synthetically generated through a genetic algorithm whose fitness function is driven by the black-box behavior. Experiments show that the proposed method advances the state-of-the-art towards a comprehensive approach that successfully covers stability and actionability of factual and counterfactual explanations.
引用
收藏
页码:2825 / 2862
页数:38
相关论文
empty
未找到相关数据