CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations

被引:78
作者
Arras, Leila [1 ]
Osman, Ahmed [1 ]
Samek, Wojciech [1 ]
机构
[1] Fraunhofer Heinrich Hertz Inst, Dept Artificial Intelligence, D-10587 Berlin, Germany
关键词
Explainable AI; Evaluation; Benchmark; Convolutional neural network; Visual question answering; Computer vision; Relation network; CLASSIFICATION;
D O I
10.1016/j.inffus.2021.11.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of deep learning in today's applications entailed an increasing need in explaining the model's decisions beyond prediction performances in order to foster trust and accountability. Recently, the field of explainable AI (XAI) has developed methods that provide such explanations for already trained neural networks. In computer vision tasks such explanations, termed heatmaps, visualize the contributions of individual pixels to the prediction. So far XAI methods along with their heatmaps were mainly validated qualitatively via human-based assessment, or evaluated through auxiliary proxy tasks such as pixel perturbation, weak object localization or randomization tests. Due to the lack of an objective and commonly accepted quality measure for heatmaps, it was debatable which XAI method performs best and whether explanations can be trusted at all. In the present work, we tackle the problem by proposing a ground truth based evaluation framework for XAI methods based on the CLEVR visual question answering task. Our framework provides a (1) selective, (2) controlled and (3) realistic testbed for the evaluation of neural network explanations. We compare ten different explanation methods, resulting in new insights about the quality and properties of XAI methods, sometimes contradicting with conclusions from previous comparative studies. The CLEVR-XAI dataset and the benchmarking code can be found at https://github.com/ahmedmagdiosman/clevr-xai.
引用
收藏
页码:14 / 40
页数:27
相关论文
共 74 条
[1]  
Adebayo J., 2020, ADV NEURAL INFORM PR
[2]  
Adebayo J., 2018, NEURIPS
[3]  
Adebayo J, 2018, ADV NEUR IN, V31
[4]   Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering [J].
Agrawal, Aishwarya ;
Batra, Dhruv ;
Parikh, Devi ;
Kembhavi, Aniruddha .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4971-4980
[5]  
Ancona M., 2018, INT C LEARNING REPRE
[6]  
[Anonymous], 2017, P INT C MACH LEARN W
[7]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[8]  
Arras L., 2019, Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, V11700, P211, DOI [DOI 10.1007/978-3-030-28954-611, 10.1007/978-3-030-28954-6_11, DOI 10.1007/978-3-030-28954-6_11, 10.1007/ 978-3-030-28954-6_11]
[9]  
Arras L., 2017, P 8 WORKSH COMP APPR, P159
[10]   "What is relevant in a text document?": An interpretable machine learning approach [J].
Arras, Leila ;
Horn, Franziska ;
Montavon, Gregoire ;
Mueller, Klaus-Robert ;
Samek, Wojciech .
PLOS ONE, 2017, 12 (08)