Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations

被引：25

作者：

Dai, Jessica ^{[1
]}

Upadhyay, Sohini ^{[2
]}

Aivodji, Ulrich ^{[3
]}

Bach, Stephen H. ^{[1
]}

Lakkaraju, Himabindu ^{[2
]}

机构：

[1] Brown Univ, Providence, RI 02912 USA

[2] Harvard Univ, Cambridge, MA 02138 USA

[3] Univ Quebec Montreal, Montreal, PQ, Canada

来源：

PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022 | 2022年

关键词：

explainable machine learning; interpretability; fairness; robustness; MACHINE; MODELS;

D O I：

10.1145/3514094.3534159

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to ensure that the quality of the resulting explanations is consistently high across all subgroups of a population. For instance, it should not be the case that explanations associated with instances belonging to, e.g., women, are less accurate than those associated with other genders. In this work, we initiate the study of identifying group-based disparities in explanation quality. To this end, we first outline several key properties that contribute to explanation quality-namely, fidelity (accuracy), stability, consistency, and sparsity-and discuss why and how disparities in these properties can be particularly problematic. We then propose an evaluation framework which can quantitatively measure disparities in the quality of explanations. Using this framework, we carry out an empirical analysis with three datasets, six post hoc explanation methods, and different model classes to understand if and when group-based disparities in explanation quality arise. Our results indicate that such disparities are more likely to occur when the models being explained are complex and non-linear. We also observe that certain post hoc explanation methods (e.g., Integrated Gradients, SHAP) are more likely to exhibit disparities. Our work sheds light on previously unexplored ways in which explanation methods may introduce unfairness in real world decision making.

引用

页码：203 / 214

页数：12

共 78 条

[1]

Abdollahi B, 2018, HUM-COMPUT INT-SPRIN, P21, DOI 10.1007/978-3-319-90403-0_2

[2]

Adebayo J, 2020, Arxiv, DOI [arXiv:1810.03292, 10.48550/ARXIV.1810.03292, DOI 10.48550/ARXIV.1810.03292]

[3]

Agarwal S., 2021, Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

[4]

Aivodji U, 2019, PR MACH LEARN RES, V97

[5]

Alikhademi K, 2021, Arxiv, DOI arXiv:2106.07483

[6]

Alvarez-Melis D, 2018, Arxiv, DOI arXiv:1806.07538

[7]

Alvarez-Melis D, 2018, Arxiv, DOI arXiv:1806.08049

[8]

Angwin J., 2016, Machine bias: There's software used across the country to predict future criminals. And it's biased against Blacks

[9]

Balagopalan Aparna, 2022, ACM C FAIRN ACC TRAN

[10]

Bao MCL, 2022, Arxiv, DOI arXiv:2106.05498

← 1 2 3 4 5 6 7 8 →