What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use

被引:0
作者
Tonekaboni, Sana [1 ,2 ,4 ]
Joshi, Shalmali [2 ]
McCradden, Melissa D. [2 ,3 ,4 ]
Goldenberg, Anna [1 ,2 ,4 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[2] Vector Inst Artificial Intelligence, Toronto, ON, Canada
[3] Hosp Sick Children, Dept Bioeth, Toronto, ON, Canada
[4] Hosp Sick Children, Dept Genet & Genome Biol, Toronto, ON, Canada
来源
MACHINE LEARNING FOR HEALTHCARE CONFERENCE, VOL 106 | 2019年 / 106卷
关键词
SCORE; MODEL; SATURATION; INTERVIEWS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Translating machine learning (ML) models effectively to clinical practice requires establishing clinicians' trust. Explainability, or the ability of an ML model to justify its outcomes and assist clinicians in rationalizing the model prediction, has been generally understood to be critical to establishing trust. However, the field suffers from the lack of concrete definitions for usable explanations in different settings. To identify specific aspects of explainability that may catalyze building trust in ML models, we surveyed clinicians from two distinct acute care specialties (Intenstive Care Unit and Emergency Department). We use their feedback to characterize when explainability helps to improve clinicians' trust in ML models. We further identify the classes of explanations that clinicians identified as most relevant and crucial for effective translation to clinical practice. Finally, we discern concrete metrics for rigorous evaluation of clinical explainability methods. By integrating perceptions of explainability between clinicians and ML researchers we hope to facilitate the endorsement and broader adoption and sustained use of ML systems in healthcare.
引用
收藏
页数:21
相关论文
共 79 条
[11]  
Doshi-Velez F, 2017, Arxiv, DOI [arXiv:1702.08608, 10.48550/arXiv.1702.08608]
[12]   The pediatric early warning system score: A severity of illness score to predict urgent medical need in hospitalized children [J].
Duncan, Heather ;
Hutchison, James ;
Parshuram, Christopher S. .
JOURNAL OF CRITICAL CARE, 2006, 21 (03) :271-278
[13]  
Elish MC., 2018, ETHNOGRAPHIC PRAXIS, V2018, P364, DOI [10.1111/1559-8918.2018, DOI 10.1111/1559-8918.2018, DOI 10.1111/1559-8918.2018.01213]
[14]   Evaluating alert fatigue over time to EHR-based clinical trial alerts: findings from a randomized controlled study [J].
Embi, Peter J. ;
Leonard, Anthony C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (E1) :E145-E148
[15]   Piloting Electronic Medical Record-Based Early Detection of Inpatient Deterioration in Community Hospitals [J].
Escobar, Gabriel J. ;
Turk, Benjamin J. ;
Ragins, Arona ;
Ha, Jason ;
Hoberman, Brian ;
LeVine, Steven M. ;
Ballesca, Manuel A. ;
Liu, Vincent ;
Kipnis, Patricia .
JOURNAL OF HOSPITAL MEDICINE, 2016, 11 :S18-S24
[16]  
Gal Y., 2016, Uncertainty in Deep Learning, V1, P3
[17]  
Gal Y, 2016, PR MACH LEARN RES, V48
[18]  
Ghassemi M, 2019, Arxiv, DOI arXiv:1806.00388
[19]  
Ghassemi M, 2018, Arxiv, DOI arXiv:1810.05798
[20]  
Guidi J. L., 2015, Annals of the American Thoracic Society, P12