Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods

被引:246
作者
Handelman, Guy S. [1 ,2 ]
Kok, Hong Kuan [3 ,4 ]
Chandra, Ronil V. [5 ,6 ]
Razavi, Amir H. [7 ,8 ]
Huang, Shiwei [9 ]
Brooks, Mark [10 ]
Lee, Michael J. [2 ,11 ]
Asadi, Hamed [5 ,10 ,12 ]
机构
[1] Belfast City Hosp, Dept Radiol, 51 Lisburn Rd, Belfast BT9 7AB, Antrim, North Ireland
[2] Royal Coll Surgeons Ireland, Dublin, Ireland
[3] Northern Hosp Radiol, Intervent Radiol Serv, Epping, NSW, Australia
[4] Deakin Univ, Sch Med, Fac Hlth, Waurn Ponds, Australia
[5] Monash Hlth, Intervent Neuroradiol Serv, Monash Imaging, Clayton, Vic, Australia
[6] Monash Univ, Fac Med Nursing & Hlth Sci, Clayton, Vic, Australia
[7] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON, Canada
[8] BCE Corp Security, Ottawa, ON, Canada
[9] Australian Natl Univ, Med Sch, Garran, Australia
[10] Austin Hlth, Intervent Neuroradiol Serv, Dept Radiol, Heidelberg, Vic, Australia
[11] Beaumont Hosp, Dept Radiol, Dublin, Ireland
[12] Univ Melbourne, Florey Inst Neurosci & Mental Hlth, Melbourne, Vic, Australia
关键词
artificial intelligence; machine learning; medicine; supervised machine learning; unsupervised machine learning; OPERATING CHARACTERISTIC CURVES; DIAGNOSIS; VALIDATION;
D O I
10.2214/AJR.18.20224
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
OBJECTIVE. Machine learning (ML) and artificial intelligence (AI) are rapidly becoming the most talked about and controversial topics in radiology and medicine. Over the past few years, the numbers of ML- or AI-focused studies in the literature have increased almost exponentially, and ML has become a hot topic at academic and industry conferences. However, despite the increased awareness of ML as a tool, many medical professionals have a poor understanding of how ML works and how to critically appraise studies and tools that are presented to us. Thus, we present a brief overview of ML, explain the metrics used in ML and how to interpret them, and explain some of the technical jargon associated with the field so that readers with a medical background and basic knowledge of statistics can feel more comfortable when examining ML applications. CONCLUSION. Attention to sample size, overfitting, underfitting, cross validation, as well as a broad knowledge of the metrics of machine learning, can help those with little or no technical knowledge begin to assess machine learning studies. However, transparency in methods and sharing of algorithms is vital to allow clinicians to assess these tools themselves.
引用
收藏
页码:38 / 43
页数:6
相关论文
共 21 条
[1]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[2]   Machine Learning for Outcome Prediction of Acute Ischemic Stroke Post Intra-Arterial Therapy [J].
Asadi, Hamed ;
Dowling, Richard ;
Yan, Bernard ;
Mitchell, Peter .
PLOS ONE, 2014, 9 (02)
[3]   Statistical modeling: The two cultures [J].
Breiman, L .
STATISTICAL SCIENCE, 2001, 16 (03) :199-215
[4]   Computer-aided diagnosis of lung cancer and pulmonary embolism in computed tomography - A review [J].
Chan, Heang-Ping ;
Hadjiiski, Lubomir ;
Zhou, Chuan ;
Sahiner, Berkman .
ACADEMIC RADIOLOGY, 2008, 15 (05) :535-555
[5]   Machine Learning in Medicine [J].
Deo, Rahul C. .
CIRCULATION, 2015, 132 (20) :1920-1930
[6]   Unsupervised learning technique identifies bronchiectasis phenotypes with distinct clinical characteristics [J].
Guan, W-J. ;
Jiang, M. ;
Gao, Y-H. ;
Li, H-M. ;
Xu, G. ;
Zheng, J-P. ;
Chen, R-C. ;
Zhong, N-S. .
INTERNATIONAL JOURNAL OF TUBERCULOSIS AND LUNG DISEASE, 2016, 20 (03) :402-410
[7]   Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs [J].
Gulshan, Varun ;
Peng, Lily ;
Coram, Marc ;
Stumpe, Martin C. ;
Wu, Derek ;
Narayanaswamy, Arunachalam ;
Venugopalan, Subhashini ;
Widner, Kasumi ;
Madams, Tom ;
Cuadros, Jorge ;
Kim, Ramasamy ;
Raman, Rajiv ;
Nelson, Philip C. ;
Mega, Jessica L. ;
Webster, R. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2016, 316 (22) :2402-2410
[8]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[9]  
James G, 2014, INTRO STAT LEARNING, P430
[10]   The use of receiver operating characteristic curves in biomedical informatics [J].
Lasko, TA ;
Bhagwat, JG ;
Zou, KH ;
Ohno-Machado, L .
JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (05) :404-415