Multicentre external validation of a commercial artificial intelligence software to analyse chest radiographs in health screening environments with low disease prevalence

被引:16
作者
Kim, Cherry [1 ]
Yang, Zepa [2 ]
Park, Seong Ho [3 ,4 ]
Hwang, Sung Ho [5 ]
Oh, Yu-Whan [5 ]
Kang, Eun-Young [6 ]
Yong, Hwan Seok [6 ]
机构
[1] Korea Univ, Ansan Hosp, Dept Radiol, Coll Med, 123 Jeokgeum Ro, Ansan 15355, Gyeonggin, South Korea
[2] Korea Univ, Guro Hosp, Biomed Res Ctr, Coll Med, Seoul 08308, South Korea
[3] Univ Ulsan, Asan Med Ctr, Dept Radiol, Coll Med, Seoul 05505, South Korea
[4] Univ Ulsan, Res Inst Radiol, Asan Med Ctr, Coll Med, Seoul 05505, South Korea
[5] Korea Univ, Anam Hosp, Dept Radiol, Coll Med, Seoul 02841, South Korea
[6] Korea Univ, Guro Hosp, Dept Radiol, Coll Med, 33-41 Gurodong Ro 28 Gil, Seoul 08308, South Korea
关键词
Artificial intelligence; Thoracic radiography; Software; Multicentre study; Validation study; CLINICAL VALIDATION;
D O I
10.1007/s00330-022-09315-z
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objectives To externally validate the performance of a commercial AI software program for interpreting CXRs in a large, consecutive, real-world cohort from primary healthcare centres.Methods A total of 3047 CXRs were collected from two primary healthcare centres, characterised by low disease prevalence, between January and December 2018. All CXRs were labelled as normal or abnormal according to CT findings. Four radiology residents read all CXRs twice with and without AI assistance. The performances of the AI and readers with and without AI assistance were measured in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity.Results The prevalence of clinically significant lesions was 2.2% (68 of 3047). The AUROC, sensitivity, and specificity of the AI were 0.648 (95% confidence interval [CI] 0.630-0.665), 35.3% (CI, 24.7-47.8), and 94.2% (CI, 93.3-95.0), respectively. AI detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumours. AI-undetected lesions tended to be smaller than true positive lesions. The readers' AUROCs ranged from 0.534-0.676 without AI and 0.571-0.688 with AI (all p values < 0.05). For all readers, the mean reading time was 2.96-10.27 s longer with AI assistance (all p values < 0.05).Conclusions The performance of commercial AI in these high-volume, low-prevalence settings was poorer than expected, although it modestly boosted the performance of less-experienced readers. The technical prowess of AI demonstrated in experimental settings and approved by regulatory bodies may not directly translate to real-world practice, especially where the demand for AI assistance is highest.
引用
收藏
页码:3501 / 3509
页数:9
相关论文
共 30 条
[1]   Evaluation of a deep learning-based computer-aided detection algorithm on chest radiographs Case-control study [J].
Choi, Soo Yun ;
Park, Sunggyun ;
Kim, Minchul ;
Park, Jongchan ;
Choi, Ye Ra ;
Jin, Kwang Nam .
MEDICINE, 2021, 100 (16) :E25663
[2]   Improving reference standards for validation of AI-based radiography [J].
Duggan, Gavin E. ;
Reicher, Joshua J. ;
Liu, Yun ;
Tse, Daniel ;
Shetty, Shravya .
BRITISH JOURNAL OF RADIOLOGY, 2021, 94 (1123)
[3]   Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy [J].
Freeman, Karoline ;
Geppert, Julia ;
Stinton, Chris ;
Todkill, Daniel ;
Johnson, Samantha ;
Clarke, Aileen ;
Taylor-Phillips, Sian .
BMJ-BRITISH MEDICAL JOURNAL, 2021, 374
[4]  
Futoma J, 2020, LANCET DIGIT HEALTH, V2, pE489, DOI 10.1016/S2589-7500(20)30186-2
[5]   Use of Artificial Intelligence-Based Software as Medical Devices for Chest Radiography: A Position Paper from the Korean Society of Thoracic Radiology [J].
Hwang, Eui Jin ;
Goo, Jin Mo ;
Yoon, Soon Ho ;
Beck, Kyongmin Sarah ;
Seo, Joon Beom ;
Choi, Byoung Wook ;
Chung, Myung Jin ;
Park, Chang Min ;
Jin, Kwang Nam ;
Lee, Sang Min .
KOREAN JOURNAL OF RADIOLOGY, 2021, 22 (11) :1743-1748
[6]   Deep Learning for Detection of Pulmonary Metastasis on Chest Radiographs [J].
Hwang, Eui Jin ;
Lee, Jeong Su ;
Lee, Jong Hyuk ;
Lim, Woo Hyeon ;
Kim, Jae Hyun ;
Choi, Kyu Sung ;
Choi, Tae Won ;
Kim, Tae-Hyung ;
Goo, Jin Mo ;
Park, Chang Min .
RADIOLOGY, 2021, 301 (02) :455-463
[7]   Deep learning algorithm for surveillance of pneumothorax after lung biopsy: a multicenter diagnostic cohort study [J].
Hwang, Eui Jin ;
Hong, Jung Hee ;
Lee, Kyung Hee ;
Kim, Jung Im ;
Nam, Ju Gang ;
Kim, Da Som ;
Choi, Hyewon ;
Yoo, Seung Jin ;
Goo, Jin Mo ;
Park, Chang Min .
EUROPEAN RADIOLOGY, 2020, 30 (07) :3660-3671
[8]   Deep Learning for Chest Radiograph Diagnosis in the Emergency Department [J].
Hwang, Eui Jin ;
Nam, Ju Gang ;
Lim, Woo Hyeon ;
Park, Sae Jin ;
Jeong, Yun Soo ;
Kang, Ji Hee ;
Hong, Eun Kyoung ;
Kim, Taek Min ;
Goo, Jin Mo ;
Park, Sunggyun ;
Kim, Ki Hwan ;
Park, Chang Min .
RADIOLOGY, 2019, 293 (03) :573-580
[9]   Development and Validation of a Deep Learning-Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs [J].
Hwang, Eui Jin ;
Park, Sunggyun ;
Jin, Kwang-Nam ;
Kim, Jung Im ;
Choi, So Young ;
Lee, Jong Hyuk ;
Goo, Jin Mo ;
Aum, Jaehong ;
Yim, Jae-Joon ;
Cohen, Julien G. ;
Ferretti, Gilbert R. ;
Park, Chang Min ;
Kim, Dong Hyeon ;
Woo, Sungmin ;
Choi, Wonseok ;
Hwang, In Pyung ;
Song, Yong Sub ;
Lim, Jiyeon ;
Kim, Hyungjin ;
Wi, Jae Yeon ;
Oh, Su Suk ;
Kang, Mi-Jin ;
Lee, Nyoung Keun ;
Yoo, Jin Young ;
Suh, Young Joo .
JAMA NETWORK OPEN, 2019, 2 (03) :e191095
[10]   Development and Validation of a Deep Learning-based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs [J].
Hwang, Eui Jin ;
Park, Sunggyun ;
Jin, Kwang-Nam ;
Kim, Jung Im ;
Choi, So Young ;
Lee, Jong Hyuk ;
Goo, Jin Mo ;
Aum, Jaehong ;
Yim, Jae-Joon ;
Park, Chang Min ;
Kim, Dong Hyeon ;
Kim, Dong Hyeon ;
Woo, Sungmin ;
Choi, Wonseok ;
Hwang, In Pyung ;
Song, Yong Sub ;
Lim, Jiyeon ;
Kim, Hyungjin ;
Wi, Jae Yeon ;
Oh, Su Suk ;
Kang, Mi-Jin ;
Woo, Chris .
CLINICAL INFECTIOUS DISEASES, 2019, 69 (05) :739-747