Sparse Attacks for Manipulating Explanations in Deep Neural Network Models

被引:1
作者
Ajalloeian, Ahmad [1 ]
Moosavi-Dezfooli, Seyed Mohsen [2 ]
Vlachos, Michalis [1 ]
Frossard, Pascal [3 ]
机构
[1] Univ Lausanne, HEC, Lausanne, Switzerland
[2] Imperial Coll, London, England
[3] Ecole Polytech Fed Lausanne EPFL, Lausanne, Switzerland
来源
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023 | 2023年
关键词
Explainable AI; Deep Neural Networks; Adversarial Attacks; sparse perturbation; Fairness;
D O I
10.1109/ICDM58522.2023.00101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate methods for manipulating classifier explanations while keeping the predictions unchanged. Our focus is on using a sparse attack, which seeks to alter only a minimal number of input features. We present an efficient and novel algorithm for computing sparse perturbations that alter the explanations but keep the predictions unaffected. We demonstrate that our algorithm, compared to PGD attacks with if constraint l(0), generates sparser perturbations while resulting in greater discrepancies between original and manipulated explanations. Moreover, we demonstrate that it is also possible to conceal the attribution of the k most significant features in the original explanation by perturbing fewer than k features of the input data. We present results for both image and tabular datasets, and emphasize the significance of sparse perturbation based attacks for trustworthy model building in high-stakes applications. Our research reveals important vulnerabilities in explanation methods that should be taken into account when developing reliable explanation methods. Code can be found at ht t ps://github.com/ahmadajal/sparse_expl_attacks
引用
收藏
页码:918 / 923
页数:6
相关论文
共 31 条
[1]  
Aivodji U., 2019, P 36 INT C MACHINE L
[2]  
Anders CJ, 2020, PR MACH LEARN RES, V119
[3]  
Baehrens D, 2010, J MACH LEARN RES, V11, P1803
[4]   Sparse and Imperceivable Adversarial Attacks [J].
Croce, Francesco ;
Hein, Matthias .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4723-4731
[5]   Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations [J].
Dai, Jessica ;
Upadhyay, Sohini ;
Aivodji, Ulrich ;
Bach, Stephen H. ;
Lakkaraju, Himabindu .
PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022, 2022, :203-214
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]  
Dombrowski AK, 2019, ADV NEUR IN, V32
[8]  
Dombrowski AK, 2020, Arxiv, DOI arXiv:2012.10425
[9]  
Dong X., 2020, ANN C NEURAL INFORM
[10]  
Ghorbani A, 2019, AAAI CONF ARTIF INTE, P3681