Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening

被引:15
作者
Chalkidou, Anastasia [1 ]
Shokraneh, Farhad [1 ]
Kijauskaite, Goda [2 ]
Taylor-Phillips, Sian [3 ]
Halligan, Steve [4 ]
Wilkinson, Louise [5 ]
Glocker, Ben [6 ]
Garrett, Peter [7 ]
Denniston, Alastair K. [8 ]
Mackie, Anne [2 ]
Seedat, Farah [2 ]
机构
[1] Kings Coll London, Kings Technol Evaluat Ctr, London SE1 7EU, England
[2] UK Natl Screening Comm, Off Hlth Improvement & Dispar, Dept Hlth & Social Care, London, England
[3] Univ Warwick, Warwick Med Sch, Coventry, W Midlands, England
[4] UCL, Ctr Med Imaging, Div Med, London, England
[5] Univ Oxford, Oxford Breast Imaging Ctr, Oxford, England
[6] Imperial Coll London, Dept Comp, London, England
[7] Univ Manchester, Dept Chem Engn & Analyt Sci, Manchester, Lancs, England
[8] Univ Hosp Birmingham NHS Fdn Trust, Dept Ophthalmol, Birmingham, W Midlands, England
关键词
DIABETIC-RETINOPATHY; TECHNOLOGY; ALGORITHMS; DIAGNOSIS; DATASET;
D O I
10.1016/S2589-7500(22)00186-8
中图分类号
R-058 [];
学科分类号
摘要
Rigorous evaluation of artificial intelligence (AI) systems for image classification is essential before deployment into health-care settings, such as screening programmes, so that adoption is effective and safe. A key step in the evaluation process is the external validation of diagnostic performance using a test set of images. We conducted a rapid literature review on methods to develop test sets, published from 2012 to 2020, in English. Using thematic analysis, we mapped themes and coded the principles using the Population, Intervention, and Comparator or Reference standard, Outcome, and Study design framework. A group of screening and AI experts assessed the evidence-based principles for completeness and provided further considerations. From the final 15 principles recommended here, five affect population, one intervention, two comparator, one reference standard, and one both reference standard and comparator. Finally, four are appliable to outcome and one to study design. Principles from the literature were useful to address biases from AI; however, they did not account for screening specific biases, which we now incorporate. The principles set out here should be used to support the development and use of test sets for studies that assess the accuracy of AI within screening programmes, to ensure they are fit for purpose and minimise bias.
引用
收藏
页码:E899 / E905
页数:7
相关论文
共 33 条
[1]   Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning [J].
Abramoff, Michael David ;
Lou, Yiyue ;
Erginay, Ali ;
Clarida, Warren ;
Amelon, Ryan ;
Folk, James C. ;
Niemeijer, Meindert .
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2016, 57 (13) :5200-5206
[2]   Artificial intelligence using deep learning to screen for referable and vision-threatening diabetic retinopathy in Africa: a clinical validation study [J].
Bellemo, Valentina ;
Lim, Zhan W. ;
Lim, Gilbert ;
Nguyen, Quang D. ;
Xie, Yuchen ;
Yip, Michelle Y. T. ;
Hamzah, Haslina ;
Ho, Jinyi ;
Lee, Xin Q. ;
Hsu, Wynne ;
Lee, Mong L. ;
Musonda, Lillian ;
Chandran, Manju ;
Chipalo-Mutati, Grace ;
Muma, Mulenga ;
Tan, Gavin S. W. ;
Sivaprasad, Sobha ;
Menon, Geeta ;
Wong, Tien Y. ;
Ting, Daniel S. W. .
LANCET DIGITAL HEALTH, 2019, 1 (01) :E35-E44
[3]   Assessing Radiology Research on Artificial Intelligence: A Brief Guide for Authors, Reviewers, and Readers-From the Radiology Editorial Board [J].
Bluemke, David A. ;
Moy, Linda ;
Bredella, Miriam A. ;
Ertl-Wagner, Birgit B. ;
Fowler, Kathryn J. ;
Goh, Vicky J. ;
Halpern, Elkan F. ;
Hess, Christopher P. ;
Schiebler, Mark L. ;
Weiss, Clifford R. .
RADIOLOGY, 2020, 294 (03) :487-489
[4]   Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review [J].
Brinker, Titus Josef ;
Hekler, Achim ;
Utikal, Jochen Sven ;
Grabe, Niels ;
Schadendorf, Dirk ;
Klode, Joachim ;
Berking, Carola ;
Steeb, Theresa ;
Enk, Alexander H. ;
von Kalle, Christof .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2018, 20 (10)
[5]   A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks-the Cohort of Screen-Aged Women (CSAW) [J].
Dembrower, Karin ;
Lindholm, Peter ;
Strand, Fredrik .
JOURNAL OF DIGITAL IMAGING, 2020, 33 (02) :408-413
[6]   Artificial Intelligence for Medical Image Analysis: A Guide for Authors and Reviewers [J].
England, Joseph R. ;
Cheng, Phillip M. .
AMERICAN JOURNAL OF ROENTGENOLOGY, 2019, 212 (03) :513-519
[7]   A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies [J].
Faes, Livia ;
Liu, Xiaoxuan ;
Wagner, Siegfried K. ;
Fu, Dun Jack ;
Balaskas, Konstantinos ;
Sim, Dawn A. ;
Bachmann, Lucas M. ;
Keane, Pearse A. ;
Denniston, Alastair K. .
TRANSLATIONAL VISION SCIENCE & TECHNOLOGY, 2020, 9 (02)
[8]   Artificial Intelligence in Ophthalmology in 2020: A Technology on the Cusp for Translation and Implementation [J].
Gunasekeran, Dinesh Visva ;
Wong, Tien Yin .
ASIA-PACIFIC JOURNAL OF OPHTHALMOLOGY, 2020, 9 (02) :61-66
[9]   OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data [J].
Halling-Brown, Mark D. ;
Warren, Lucy M. ;
Ward, Dominic ;
Lewis, Emma ;
Mackenzie, Alistair ;
Wallis, Matthew G. ;
Wilkinson, Louise S. ;
Given-Wilson, Rosalind M. ;
McAvinchey, Rita ;
Young, Kenneth C. .
RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2021, 3 (01)
[10]   MINIMAR (MINimum Information for Medical Al Reporting): Developing reporting standards for artificial intelligence in health care [J].
Hernandez-Boussard, Tina ;
Bozkurt, Selen ;
Ioannidis, John P. A. ;
Shah, Nigam H. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (12) :2011-2015