Deep Learning for the Diagnosis of Stage in Retinopathy of Prematurity Accuracy and Generalizability across Populations and Cameras

被引:39
作者
Chen, Jimmy S. [1 ]
Coyner, Aaron S. [2 ]
Ostmo, Susan [1 ]
Sonmez, Kemal [3 ]
Bajimaya, Sanyam [4 ]
Pradhan, Eli [4 ]
Valikodath, Nita [5 ]
Cole, Emily D. [5 ]
Al-Khaled, Tala [5 ]
Chan, R. V. Paul [5 ]
Singh, Praveer [6 ]
Kalpathy-Cramer, Jayashree [6 ,7 ,8 ]
Chiang, Michael F. [1 ,2 ]
Campbell, J. Peter [1 ]
机构
[1] Oregon Hlth & Sci Univ, Casey Eye Inst, Dept Ophthalmol, 515 SW Campus Dr, Portland, OR 97239 USA
[2] Oregon Hlth & Sci Univ, Dept Med Informat & Clin Epidemiol, Portland, OR 97239 USA
[3] Oregon Hlth & Sci Univ, Canc Early Detect Adv Res Ctr, Knight Canc Inst, Portland, OR 97239 USA
[4] Tilganga Inst Ophthalmol, Kathmandu, Nepal
[5] Univ Illinois, Dept Ophthalmol & Visual Sci, Illinois Eye & Ear Infirm, Chicago, IL USA
[6] Massachusetts Gen Hosp, Dept Radiol, Athinoula A Martinos Ctr Biomed Imaging, Charlestown, MA USA
[7] Massachusetts Gen Hosp, Ctr Clin Data Sci, Boston, MA 02114 USA
[8] Brigham & Womens Hosp, 75 Francis St, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
Artificial intelligence; Generalizability; Neural networks; Retinopathy of prematurity; Stage; DIABETIC-RETINOPATHY; CLASSIFICATION; VALIDATION; IMAGES;
D O I
10.1016/j.oret.2020.12.013
中图分类号
R77 [眼科学];
学科分类号
100212 ;
摘要
Purpose: Stage is an important feature to identify in retinal images of infants at risk of retinopathy of prematurity (ROP). The purpose of this study was to implement a convolutional neural network (CNN) for binary detection of stages 1, 2, and 3 in ROP and to evaluate its generalizability across different populations and camera systems. Design: Diagnostic validation study of CNN for stage detection. Participants: Retinal fundus images obtained from preterm infants during routine ROP screenings. Methods: Two datasets were used: 5943 fundus images obtained by RetCam camera (Natus Medical, Pleasanton, CA) from 9 North American institutions and 5049 images obtained by 3nethra camera (Forus Health Incorporated, Bengaluru, India) from 4 hospitals in Nepal. Images were labeled based on the presence of stage by 1 to 3 expert graders. Three CNN models were trained using 5-fold cross-validation on datasets from North America alone, Nepal alone, and a combined dataset and were evaluated on 2 held-out test sets consisting of 708 and 247 images from the Nepali and North American datasets, respectively. Main Outcome Measures: Convolutional neural network performance was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, and specificity. Results: Both the North American- and Nepali-trained models demonstrated high performance on a test set from the same population: AUROC, 0.99; AUPRC, 0.98; sensitivity, 94%; and AUROC, 0.97; AUPRC, 0.91; and sensitivity, 73%; respectively. However, the performance of each model decreased to AUROC of 0.96 and AUPRC of 0.88 (sensitivity, 52%) and AUROC of 0.62 and AUPRC of 0.36 (sensitivity, 44%) when evaluated on a test set from the other population. Compared with the models trained on individual datasets, the model trained on a combined dataset achieved improved performance on each respective test set: sensitivity improved from 94% to 98% on the North American test set and from 73% to 82% on the Nepali test set. Conclusions: A CNN can identify accurately the presence of ROP stage in retinal images, but performance depends on the similarity between training and testing populations. We demonstrated that internal and external performance can be improved by increasing the heterogeneity of the training dataset features of the training dataset, in this case by combining images from different populations and cameras. (C) 2020 by the American Academy of Ophthalmology
引用
收藏
页码:1027 / 1035
页数:9
相关论文
共 52 条
[1]   Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices [J].
Abramoff, Michael D. ;
Lavin, Philip T. ;
Birch, Michele ;
Shah, Nilay ;
Folk, James C. .
NPJ DIGITAL MEDICINE, 2018, 1
[2]   Machine Learning and Health Care Disparities in Dermatology [J].
Adamson, Adewole S. ;
Smith, Avery .
JAMA DERMATOLOGY, 2018, 154 (11) :1247-1248
[3]   Deep learning for segmentation of brain tumors: Impact of cross-institutional training and testing [J].
AlBadawy, Ehab A. ;
Saha, Ashirbani ;
Mazurowski, Maciej A. .
MEDICAL PHYSICS, 2018, 45 (03) :1150-1158
[4]  
[Anonymous], 1995, Centrum voor Wiskunde en Informatica Amsterdam
[5]   Deep learning predicts hip fracture using confounding patient and healthcare variables [J].
Badgeley, Marcus A. ;
Zech, John R. ;
Oakden-Rayner, Luke ;
Glicksberg, Benjamin S. ;
Liu, Manway ;
Gale, William ;
McConnell, Michael, V ;
Percha, Bethany ;
Snyder, Thomas M. ;
Dudley, Joel T. .
NPJ DIGITAL MEDICINE, 2019, 2 (1)
[6]   Preterm-associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010 [J].
Blencowe, Hannah ;
Lawn, Joy E. ;
Vazquez, Thomas ;
Fielder, Alistair ;
Gilbert, Clare .
PEDIATRIC RESEARCH, 2013, 74 :35-49
[7]  
Bradski G, 2000, DR DOBBS J, V25, P120
[8]   Automated Diagnosis of Plus Disease in Retinopathy of Prematurity Using Deep Convolutional Neural Networks [J].
Brown, James M. ;
Campbell, J. Peter ;
Beers, Andrew ;
Chang, Ken ;
Ostmo, Susan ;
Chan, R. V. Paul ;
Dy, Jennifer ;
Erdogmus, Deniz ;
Ioannidis, Stratis ;
Kalpathy-Cramer, Jayashree ;
Chiang, Michael F. .
JAMA OPHTHALMOLOGY, 2018, 136 (07) :803-810
[9]   Use of Deep Learning for Detailed Severity Characterization and Estimation of 5-Year Risk Among Patients With Age-Related Macular Degeneration [J].
Burlina, Philippe M. ;
Joshi, Neil ;
Pacheco, Katia D. ;
Freund, David E. ;
Kong, Jun ;
Bressler, Neil M. .
JAMA OPHTHALMOLOGY, 2018, 136 (12) :1359-1366
[10]   Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density [J].
Chang, Ken ;
Beers, Andrew L. ;
Brink, Laura ;
Patel, Jay B. ;
Singh, Praveer ;
Arun, Nishanth T. ;
Hoebel, Katharina V. ;
Gaw, Nathan ;
Shah, Meesam ;
Pisano, Etta D. ;
Tilkin, Mike ;
Coombs, Laura P. ;
Dreyer, Keith J. ;
Allen, Bibb ;
Agarwal, Sheela ;
Kalpathy-Cramer, Jayashree .
JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2020, 17 (12) :1653-1662