Evaluating How Explainable AI Is Perceived in the Medical Domain: A Human-Centered Quantitative Study of XAI in Chest X-Ray Diagnostics

被引:1
作者
Karagoz, Gizem [1 ]
van Kollenburg, Geert [1 ]
Ozcelebi, Tanir [1 ]
Meratnia, Nirvana [1 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
来源
TRUSTWORTHY ARTIFICIAL INTELLIGENCE FOR HEALTHCARE, TAI4H 2024 | 2024年 / 14812卷
关键词
Explainable AI; XAI Evaluation; Human-Centered Evaluation; XAI in Healthcare; Medical Imaging; ARTIFICIAL-INTELLIGENCE;
D O I
10.1007/978-3-031-67751-9_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The crucial role of Explainable Artificial Intelligence (XAI) in healthcare is underscored by the need for both accurate diagnosis and transparency of decision making to improve trust in the decisions on the one hand and to facilitate its adoption by medical professionals on the other hand. In this paper, We present results of a quantitative user study to evaluate how widely used XAI methods are perceived by medical experts. For doing so, we utilize two prominent post-hoc model-agnostic XAI methods, i.e., Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive explanations (SHAP). For this study, a considerable cohort of 97 medical experts was recruited to investigate whether these XAI methods assist the medical experts in their diagnosis on Chest X-ray scans. We designed an evaluation framework to investigate diagnosis accuracy, trust change, coherence with expert reasoning, and confidence differences before and after seeing provided explanations of XAI. This large-scale study showed that both XAI methods improve scores on indicative explanations. The overall change in trust was not significantly different across LIME and SHAP, indicating that, there are other factors for trust enhancement in AI diagnostics beyond providing explanations. This work has proposed a robust, human-centered benchmark, supporting the research and development of interpretable, reliable, and clinically-aligned AI tools, and directing the future of AI in high-stakes healthcare applications towards enhanced transparency and accountability.
引用
收藏
页码:92 / 108
页数:17
相关论文
共 29 条
[1]  
Adadi A., 2020, EMBEDDED SYSTEMS ART, P327, DOI [DOI 10.1007/978-981-15-0947-6, DOI 10.1007/978-981-15-0947-631]
[2]   Exploring the Capabilities of a Lightweight CNN Model in Accurately Identifying Renal Abnormalities: Cysts, Stones, and Tumors, Using LIME and SHAP [J].
Bhandari, Mohan ;
Yogarajah, Pratheepan ;
Kavitha, Muthu Subash ;
Condell, Joan .
APPLIED SCIENCES-BASEL, 2023, 13 (05)
[3]   Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making [J].
Cai, Carrie J. ;
Reif, Emily ;
Hegde, Narayan ;
Hipp, Jason ;
Kim, Been ;
Smilkov, Daniel ;
Wattenberg, Martin ;
Viegas, Fernanda ;
Corrado, Greg S. ;
Stumpe, Martin C. ;
Terry, Michael .
CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[4]   Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission [J].
Caruana, Rich ;
Lou, Yin ;
Gehrke, Johannes ;
Koch, Paul ;
Sturm, Marc ;
Elhadad, Noemie .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :1721-1730
[5]   Explainable medical imaging AI needs human-centered design: guidelines and evidence from a systematic review [J].
Chen, Haomin ;
Gomez, Catalina ;
Huang, Chien-Ming ;
Unberath, Mathias .
NPJ DIGITAL MEDICINE, 2022, 5 (01)
[6]   Applications of artificial intelligence in drug development using real-world data [J].
Chen, Zhaoyi ;
Liu, Xiong ;
Hogan, William ;
Shenkman, Elizabeth ;
Bian, Jiang .
DRUG DISCOVERY TODAY, 2021, 26 (05) :1256-1264
[7]  
Cohen J., 1988, Statistical Power Analysis for the Behavioral Sciences., V2nd, DOI [DOI 10.1007/978-1-4684-5439-0_2, DOI 10.4324/9780203771587, 10.4324/9780203771587]
[8]  
Escalante H. J., 2018, Explainable and Interpretable Models in Computer Vision and Machine Learning, P3, DOI DOI 10.1007/978-3-319-98131-4_1
[9]  
Field A., 2024, Discovering statistics using IBM SPSS statistics: North American edition
[10]  
Fisher R.A., 1960, The Design of Experiments. The Design of Experiments, V7th