Approximation of SHAP Values for Randomized Tree Ensembles

被引：7

作者：

Loecher, Markus ^{[1
]}

Lai, Dingyi ^{[2
]}

Qi, Wu ^{[2
]}

机构：

[1] Berlin Sch Econ & Law, D-10825 Berlin, Germany

[2] Humboldt Univ, Dept Stat, Berlin, Germany

来源：

MACHINE LEARNING AND KNOWLEDGE EXTRACTION, CD-MAKE 2022 | 2022年 / 13480卷

关键词：

SHAP values; Saabas value; Variable importance; Random forests; Boosting; GINI impurity;

D O I：

10.1007/978-3-031-14463-9_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Classification and regression trees offer straightforward methods of attributing importance values to input features, either globally or for a single prediction. Conditional feature contributions (CFCs) yield local, case-by-case explanations of a prediction by following the decision path and attributing changes in the expected output of the model to each feature along the path. However, CFCs suffer from a potential bias which depends on the distance from the root of a tree. The by now immensely popular alternative, SHapley Additive exPlanation (SHAP) values appear to mitigate this bias but are computationally much more expensive. Here we contribute a thorough, empirical comparison of the explanations computed by both methods on a set of 164 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. For random forests and boosted trees, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores, leading to very similar rankings and interpretations. Unsurprisingly, these insights extend to the fidelity of using global feature importance scores as a proxy for the predictive power associated with each feature.

引用

页码：19 / 30

页数：12

共 20 条

[1] Random forests [J].