Improving breast cancer diagnostics with deep learning for MRI

被引:79
作者
Witowski, Jan [1 ,2 ]
Heacock, Laura [1 ]
Reig, Beatriu [1 ]
Kang, Stella K. [1 ,3 ]
Lewin, Alana [1 ]
Pysarenko, Kristine [1 ]
Patel, Shalin [1 ]
Samreen, Naziya [1 ]
Rudnicki, Wojciech [4 ]
Luczynska, Elzbieta [4 ]
Popiela, Tadeusz [5 ]
Moy, Linda [1 ,2 ,6 ,7 ]
Geras, Krzysztof J. [1 ,2 ,6 ,7 ,8 ,9 ]
机构
[1] New York Univ, Dept Radiol, Grossman Sch Med, New York, NY 10016 USA
[2] New York Univ, Ctr Adv Innovat & Res, New York, NY 10016 USA
[3] New York Univ, Dept Populat Hlth, Grossman Sch Med, New York, NY 10016 USA
[4] Jagiellonian Univ, Electradiol Dept, Med Coll, PL-31126 Krakow, Poland
[5] Jagiellonian Univ, Chair Radiol, Med Coll, PL-31501 Krakow, Poland
[6] New York Univ, Vilcek Inst Grad Biomed Sci, Grossman Sch Med, New York, NY 10016 USA
[7] New York Univ Langone Hlth, Perlmutter Canc Ctr, New York, NY 10016 USA
[8] New York Univ, Ctr Data Sci, New York, NY 10011 USA
[9] New York Univ, Courant Inst Math Sci, Dept Comp Sci, New York, NY 10012 USA
基金
美国国家卫生研究院;
关键词
MULTIREADER; ACCURACY; CURVES; SYSTEM; RISK;
D O I
10.1126/scitranslmed.abo4802
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has a high sensitivity in detecting breast cancer but often leads to unnecessary biopsies and patient workup. We used a deep learning (DL) system to improve the overall accuracy of breast cancer diagnosis and personalize management of patients undergoing DCE-MRI. On the internal test set (n = 3936 exams), our system achieved an area under the receiver operating characteristic curve (AUROC) of 0.92 (95% CI: 0.92 to 0.93). In a retrospective reader study, there was no statistically significant difference (P = 0.19) between five board-certified breast radiologists and the DL system (mean Delta AUROC, +0.04 in favor of the DL system). Radiologists' performance improved when their predictions were averaged with DL's predictions [mean Delta AUPRC (area under the precision-recall curve), +0.07]. We demonstrated the generalizability of the DL system using multiple datasets from Poland and the United States. An additional reader study on a Polish dataset showed that the DL system was as robust to distribution shift as radiologists. In subgroup analysis, we observed consistent results across different cancer subtypes and patient demographics. Using decision curve analysis, we showed that the DL system can reduce unnecessary biopsies in the range of clinically relevant risk thresholds. This would lead to avoiding biopsies yielding benign results in up to 20% of all patients with BI-RADS category 4 lesions. Last, we performed an error analysis, investigating situations where DL predictions were mostly incorrect. This exploratory work creates a foundation for deployment and prospective analysis of DL-based models for breast MRI.
引用
收藏
页数:13
相关论文
共 65 条
[31]   Contrast-enhanced MRI for breast cancer screening [J].
Mann, Ritse M. ;
Kuhl, Christiane K. ;
Moy, Linda .
JOURNAL OF MAGNETIC RESONANCE IMAGING, 2019, 50 (02) :377-390
[32]   International evaluation of an AI system for breast cancer screening [J].
McKinney, Scott Mayer ;
Sieniek, Marcin ;
Godbole, Varun ;
Godwin, Jonathan ;
Antropova, Natasha ;
Ashrafian, Hutan ;
Back, Trevor ;
Chesus, Mary ;
Corrado, Greg C. ;
Darzi, Ara ;
Etemadi, Mozziyar ;
Garcia-Vicente, Florencia ;
Gilbert, Fiona J. ;
Halling-Brown, Mark ;
Hassabis, Demis ;
Jansen, Sunny ;
Karthikesalingam, Alan ;
Kelly, Christopher J. ;
King, Dominic ;
Ledsam, Joseph R. ;
Melnick, David ;
Mostofi, Hormuz ;
Peng, Lily ;
Reicher, Joshua Jay ;
Romera-Paredes, Bernardino ;
Sidebottom, Richard ;
Suleyman, Mustafa ;
Tse, Daniel ;
Young, Kenneth C. ;
De Fauw, Jeffrey ;
Shetty, Shravya .
NATURE, 2020, 577 (7788) :89-+
[33]  
Muller R., 2020, arXiv, DOI DOI 10.48550/ARXIV.1906.02629
[34]  
Mustafa B., 2021, arXiv, DOI DOI 10.48550/ARXIV.2101.05913
[35]  
neptune.ai, 2020, NEPTUNE EXPT MANAGEM
[36]   HYPOTHESIS-TESTING OF DIAGNOSTIC-ACCURACY FOR MULTIPLE READERS AND MULTIPLE TESTS - AN ANOVA APPROACH WITH DEPENDENT OBSERVATIONS [J].
OBUCHOWSKI, NA ;
ROCKETTE, HE .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1995, 24 (02) :285-308
[37]   Multireader, multicase receiver operating characteristic analysis: An empirical comparsion of five methods [J].
Obuchowski, NA ;
Beiden, SV ;
Berbaum, KS ;
Hillis, SL ;
Ishwaran, H ;
Song, HH ;
Wagner, RF .
ACADEMIC RADIOLOGY, 2004, 11 (09) :980-995
[38]   Receiver operating characteristic (ROC) curves: review of methods with applications in diagnostic medicine [J].
Obuchowski, Nancy A. ;
Bullen, Jennifer A. .
PHYSICS IN MEDICINE AND BIOLOGY, 2018, 63 (07) :1-28
[39]   Preoperative Breast MRI in Women 35 Years of Age and Younger with Breast Cancer: Benefits in Surgical Outcomes by Using Propensity Score Analysis [J].
Park, Ah Reum ;
Chae, Eun Young ;
Cha, Joo Hee ;
Shin, Hee Jung ;
Choi, Woo Jung ;
Kim, Hak Hee .
RADIOLOGY, 2021, 300 (01) :39-45
[40]   TorchIO: A Python']Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning [J].
Perez-Garcia, Fernando ;
Sparks, Rachel ;
Ourselin, Sebastien .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2021, 208