共 50 条
Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance
被引:8
|作者:
Kerschke, Laura
[1
]
Weigel, Stefanie
[2
,3
]
Rodriguez-Ruiz, Alejandro
[4
]
Karssemeijer, Nico
[4
,5
]
Heindel, Walter
[2
,3
]
机构:
[1] Univ Munster, Inst Biostat & Clin Res, IBKF, Schmeddingstr 56, D-48149 Munster, Germany
[2] Univ Munster, Clin Radiol & Reference Ctr Mammog Muenster, Albert Schweitzer Campus 1, D-48149 Munster, Germany
[3] Univ Hosp Muenster, Albert Schweitzer Campus 1, D-48149 Munster, Germany
[4] ScreenPoint Med BV, Toernooiveld 300, NL-6525 EC Nijmegen, Netherlands
[5] Radboud Univ Nijmegen, Dept Radiol & Nucl Med, Med Ctr, Geert Grootepl Zuid 10, NL-6525 GA Nijmegen, Netherlands
关键词:
Breast cancer;
Screening;
Mammography;
Artificial intelligence;
DUCTAL CARCINOMA;
AI;
D O I:
10.1007/s00330-021-08217-w
中图分类号:
R8 [特种医学];
R445 [影像诊断学];
学科分类号:
1002 ;
100207 ;
1009 ;
摘要:
Objectives To evaluate if artificial intelligence (AI) can discriminate recalled benign from recalled malignant mammographic screening abnormalities to improve screening performance. Methods A total of 2257 full-field digital mammography screening examinations, obtained 2011-2013, of women aged 50-69 years which were recalled for further assessment of 295 malignant out of 305 truly malignant lesions and 2289 benign lesions after independent double-reading with arbitration, were included in this retrospective study. A deep learning AI system was used to obtain a score (0-95) for each recalled lesion, representing the likelihood of breast cancer. The sensitivity on the lesion level and the proportion of women without false-positive ratings (non-FPR) resulting under AI were estimated as a function of the classification cutoff and compared to that of human readers. Results Using a cutoff of 1, AI decreased the proportion of women with false-positives from 89.9 to 62.0%, non-FPR 11.1% vs. 38.0% (difference 26.9%, 95% confidence interval 25.1-28.8%; p < .001), preventing 30.1% of reader-induced false-positive recalls, while reducing sensitivity from 96.7 to 91.1% (5.6%, 3.1-8.0%) as compared to human reading. The positive predictive value of recall (PPV-1) increased from 12.8 to 16.5% (3.7%, 3.5-4.0%). In women with mass-related lesions (n = 900), the non-FPR was 14.2% for humans vs. 36.7% for AI (22.4%, 19.8-25.3%) at a sensitivity of 98.5% vs. 97.1% (1.5%, 0-3.5%). Conclusion The application of AI during consensus conference might especially help readers to reduce false-positive recalls of masses at the expense of a small sensitivity reduction. Prospective studies are needed to further evaluate the screening benefit of AI in practice.
引用
收藏
页码:842 / 852
页数:11
相关论文