Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs

被引:156
作者
Dunnmon, Jared A. [1 ]
Yi, Darvin [2 ]
Langlotz, Curtis P. [3 ]
Re, Christopher [1 ]
Rubin, Daniel L. [2 ,3 ]
Lungren, Matthew P. [3 ]
机构
[1] Stanford Univ, Dept Comp Sci, 300 Pasteur Dr, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biomed Data Sci, 300 Pasteur Dr, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Radiol, 300 Pasteur Dr, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
OPERATING CHARACTERISTIC CURVES; DEEP; PERFORMANCE; AREAS;
D O I
10.1148/radiol.2018181422
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose: To assess the ability of convolutional neural networks (CNNs) to enable high-performance automated binary classification of chest radiographs. Materials and Methods: In a retrospective study, 216 431 frontal chest radiographs obtained between 1998 and 2012 were procured, along with associated text reports and a prospective label from the attending radiologist. This data set was used to train CNNs to classify chest radiographs as normal or abnormal before evaluation on a held-out set of 533 images hand-labeled by expert radiologists. The effects of development set size, training set size, initialization strategy, and network architecture on end performance were assessed by using standard binary classification metrics; detailed error analysis, including visualization of CNN activations, was also performed. Results: Average area under the receiver operating characteristic curve (AUC) was 0.96 for a CNN trained with 200 000 images. This AUC value was greater than that observed when the same model was trained with 2000 images (AUC = 0.84, P<.005) but was not significantly different from that observed when the model was trained with 20 000 images (AUC = 0.95, P>.05). Averaging the CNN output score with the binary prospective label yielded the best-performing classifier, with an AUC of 0.98 (P,.005). Analysis of specific radiographs revealed that the model was heavily influenced by clinically relevant spatial regions but did not reliably generalize beyond thoracic disease. Conclusion: CNNs trained with a modestly sized collection of prospectively labeled chest radiographs achieved high diagnostic performance in the classification of chest radiographs as normal or abnormal; this function may be useful for automated prioritization of abnormal chest radiographs. (c) RSNA, 2018
引用
收藏
页码:537 / 544
页数:8
相关论文
共 31 条
[1]  
Anavi Y, 2015, IEEE ENG MED BIO, P2940, DOI 10.1109/EMBC.2015.7319008
[2]  
[Anonymous], PROC CVPR IEEE
[3]  
[Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]
[4]  
[Anonymous], 2015, The Journal of Global Radiology, DOI DOI 10.7191/JGR.2015.1020
[5]   Improving Patient Safety: Avoiding Unread Imaging Exams in the National VA Enterprise Electronic Health Record [J].
Bastawrous, Sarah ;
Carney, Benjamin .
JOURNAL OF DIGITAL IMAGING, 2017, 30 (03) :309-313
[6]  
Davies ER, 2012, COMPUTER AND MACHINE VISION: THEORY, ALGORITHMS, PRACTICALITIES, 4TH EDITION, P1
[7]   COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].
DELONG, ER ;
DELONG, DM ;
CLARKEPEARSON, DI .
BIOMETRICS, 1988, 44 (03) :837-845
[8]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9]   Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs [J].
Gulshan, Varun ;
Peng, Lily ;
Coram, Marc ;
Stumpe, Martin C. ;
Wu, Derek ;
Narayanaswamy, Arunachalam ;
Venugopalan, Subhashini ;
Widner, Kasumi ;
Madams, Tom ;
Cuadros, Jorge ;
Kim, Ramasamy ;
Raman, Rajiv ;
Nelson, Philip C. ;
Mega, Jessica L. ;
Webster, R. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2016, 316 (22) :2402-2410
[10]   A METHOD OF COMPARING THE AREAS UNDER RECEIVER OPERATING CHARACTERISTIC CURVES DERIVED FROM THE SAME CASES [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1983, 148 (03) :839-843