Visual interpretation of deep learning model in ECG classification: A comprehensive evaluation of feature attribution methods

被引：0

作者：

Suh, Jangwon ^{[1
]}

Kim, Jimyeong ^{[1
]}

Kwon, Soonil ^{[2
]}

Jung, Euna ^{[3
]}

Ahn, Hyo-Jeong ^{[4
]}

Lee, Kyung-Yeon ^{[4
]}

Choi, Eue-Keun ^{[4
,5
]}

Rhee, Wonjong ^{[1
,6
]}

机构：

[1] Department of Intelligence and Information, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul,08826, Korea, Republic of

[2] Division of Cardiology, Department of Internal Medicine, SMG–SNU Boramae Medical Center, 20, Boramae-ro 5-gil, Dongjak-gu, Seoul,07061, Korea, Republic of

[3] Samsung Advanced Institute of Technology, Samsung Electronics, 130, Samsung-ro, Yeongtong-gu, Suwon,16678, Korea, Republic of

[4] Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, 101, Daehak-ro, Jongno-gu, Seoul,03080, Korea, Republic of

[5] Department of Internal Medicine, Seoul National University College of Medicine, 103, Daehak-ro, Jongno-gu, Seoul, 03080, Korea, Republic of

[6] Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul,08826, Korea, Republic of

来源：

Computers in Biology and Medicine | 2024年 / 182卷

基金：

新加坡国家研究基金会;

关键词：

Automatic evaluation - Comprehensive evaluation - Deep learning - Explainable artificial intelligence - Feature attribution - Human evaluation - Interpretability - Learning models - Model prediction - Visual interpretation;

D O I：

10.1016/j.compbiomed.2024.109088

中图分类号：

学科分类号：

摘要：

Feature attribution methods can visually highlight specific input regions containing influential aspects affecting a deep learning model's prediction. Recently, the use of feature attribution methods in electrocardiogram (ECG) classification has been sharply increasing, as they assist clinicians in understanding the model's decision-making process and assessing the model's reliability. However, a careful study to identify suitable methods for ECG datasets has been lacking, leading researchers to select methods without a thorough understanding of their appropriateness. In this work, we conduct a large-scale assessment by considering eleven popular feature attribution methods across five large ECG datasets using a model based on the ResNet-18 architecture. Our experiments include both automatic evaluations and human evaluations. Annotated datasets were utilized for automatic evaluations and three cardiac experts were involved for human evaluations. We found that Guided Grad-CAM, particularly when its absolute values are utilized, achieves the best performance. When Guided Grad-CAM was utilized as the feature attribution method, cardiac experts confirmed that it can identify diagnostically relevant electrophysiological characteristics, although its effectiveness varied across the 17 different diagnoses that we have investigated. © 2024 Elsevier Ltd

引用