On Smoothed Explanations: Quality and Robustness

被引：3

作者：

Ajalloeian, Ahmad ^{[1
]}

Moosavi-Dezfooli, Seyed Mohsen ^{[2
]}

Vlachos, Michalis ^{[1
]}

Frossard, Pascal ^{[3
]}

机构：

[1] Univ Lausanne, HEC, Lausanne, Switzerland

[2] Imperial Coll, London, England

[3] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022 | 2022年

关键词：

Transparency; Explainable AI; Gradient-based explanations; Robust explanations; Neural Networks; Adversarial Robustness;

D O I：

10.1145/3511808.3557409

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Explanation methods highlight the importance of the input features in taking a predictive decision, and represent a solution to increase the transparency and trustworthiness in machine learning and deep neural networks (DNNs). However, explanation methods can be easily manipulated generating misleading explanations particularly under visually imperceptible adversarial perturbations. Recent work has identified the decision surface geometry of DNNs as the main cause of this phenomenon. To make explanation methods more robust against adversarially crafted perturbations, recent research has promoted several smoothing approaches. These approaches smooth either the explanation map or the decision surface. In this work, we initiate a very thorough evaluation of the quality and robustness of the explanations offered by smoothing approaches. Different properties are evaluated. We present settings in which the smoothed explanations are both better, and worse, than the explanations derived by the commonly-used (non-smoothed) Gradient explanation method. By making the connection with the literature on adversarial attacks, we demonstrate that such smoothed explanations are robust primarily against additive attacks. However, a combination of additive and non-additive attacks can still manipulate these explanations, revealing important shortcomings in their robustness properties.

引用

页码：15 / 25

页数：11

共 40 条

[1]

Adebayo J, 2018, ADV NEUR IN, V31

[2]

Alvarez-Melis D, 2018, Arxiv, DOI arXiv:1806.08049

[3]

Anders C., 2020, Proceedings of Machine Learning Research, P314

[4] On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation [J].

Bach, Sebastian ;

Binder, Alexander ;

Montavon, Gregoire ;

Klauschen, Frederick ;

Mueller, Klaus-Robert ;

Samek, Wojciech .

PLOS ONE, 2015, 10 (07)

[5]

Baehrens D, 2010, J MACH LEARN RES, V11, P1803

[6]

Chalasani P, 2020, PR MACH LEARN RES, V119

[7]

Chen JF, 2019, ADV NEUR IN, V32

[8]

Dombrowski AK, 2019, ADV NEUR IN, V32

[9]

Dombrowski AK, 2020, Arxiv, DOI arXiv:2012.10425

[10]

Engstrom L., 2019, Robustness (python library)

← 1 2 3 4 →