Artificial Intelligence Evaluation of 122 969 Mammography Examinations from a Population-based Screening Program

被引:69
作者
Larsen, Marthe [1 ]
Aglen, Camilla F. [1 ]
Lee, Christoph, I [5 ,6 ]
Hoff, Solveig R. [7 ,8 ]
Lund-Hanssen, Hakon [9 ]
Lang, Kristina [10 ,11 ]
Nygard, Jan F. [2 ]
Ursin, Giske [3 ]
Hofvind, Solveig [1 ,4 ]
机构
[1] Sect Breast Canc Screening, POB 5313, N-0304 Oslo, Norway
[2] Dept Register Informat, POB 5313, N-0304 Oslo, Norway
[3] Canc Registry Norway, POB 5313, N-0304 Oslo, Norway
[4] Arctic Univ Norway, Fac Hlth Sci, Dept Hlth & Care Sci, Tromso, Norway
[5] Univ Washington, Sch Med, Dept Radiol, Seattle, WA 98195 USA
[6] Univ Washington, Sch Publ Hlth, Dept Hlth Syst & Populat Hlth, Seattle, WA 98195 USA
[7] Alesund Hosp, More & Romsdal Hosp Trust, Dept Radiol, Alesund, Norway
[8] Natl Univ Sci & Technol, Fac Med & Hlth Sci, Dept Circulat & Med Imaging, Trondheim, Norway
[9] St Olavs Univ Hosp, Dept Radiol & Nucl Med, Trondheim, Norway
[10] Lund Univ, Dept Translat Med, Lund, Sweden
[11] Skane Univ Hosp, Unilabs Mammog Unit, Malmo, Sweden
关键词
CANCER; PERFORMANCE; INTERVAL; FILM;
D O I
10.1148/radiol.212381
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background: Artificial intelligence (AI) has shown promising results for cancer detection with mammographic screening. However, evidence related to the use of AI in real screening settings remain sparse. Purpose: To compare the performance of a commercially available AI system with routine, independent double reading with consensus as performed in a population-based screening program. Furthermore, the histopathologic characteristics of tumors with different AI scores were explored. Materials and Methods: In this retrospective study, 122 969 screening examinations from 47 877 women performed at four screening units in BreastScreen Norway from October 2009 to December 2018 were included. The data set included 752 screen-detected cancers (6.1 per 1000 examinations) and 205 interval cancers (1.7 per 1000 examinations). Each examination had an AI score between 1 and 10, where 1 indicated low risk of breast cancer and 10 indicated high risk. Threshold 1, threshold 2, and threshold 3 were used to assess the performance of the AI system as a binary decision tool (selected vs not selected). Threshold 1 was set at an AI score of 10, threshold 2 was set to yield a selection rate similar to the consensus rate (8.8%), and threshold 3 was set to yield a selection rate similar to an average individual radiologist (5.8%). Descriptive statistics were used to summarize screening outcomes. Results: A total of 653 of 752 screen-detected cancers (86.8%) and 92 of 205 interval cancers (44.9%) were given a score of 10 by the AI system (threshold 1). Using threshold 3, 80.1% of the screen-detected cancers (602 of 752) and 30.7% of the interval cancers (63 of 205) were selected. Screen-detected cancer with AI scores not selected using the thresholds had favorable histopathologic characteristics compared to those selected; opposite results were observed for interval cancer. Conclusion: The proportion of screen-detected cancers not selected by the artificial intelligence (AI) system at the three evaluated thresholds was less than 20%. The overall performance of the AI system was promising according to cancer detection. (C) RSNA, 2022
引用
收藏
页码:502 / 511
页数:10
相关论文
共 28 条
[1]   Predicting Breast Cancer by Applying Deep Learning to Linked Health Records and Mammograms [J].
Akselrod-Ballin, Ayelet ;
Chorev, Michal ;
Shoshan, Yoel ;
Spiro, Adam ;
Hazan, Alon ;
Melamed, Roie ;
Barkan, Ella ;
Herzel, Esma ;
Naor, Shaked ;
Karavani, Ehud ;
Koren, Gideon ;
Goldscbmidt, Yaara ;
Shalev, Varda ;
Rosen-Zvi, Michal ;
Guindy, Michal .
RADIOLOGY, 2019, 292 (02) :331-342
[2]  
Bray F, 2018, CA-CANCER J CLIN, V68, P394, DOI [10.3322/caac.21492, 10.3322/caac.21609]
[3]   Breast cancer screening and overdiagnosis [J].
Bulliard, Jean-Luc ;
Beau, Anna-Belle ;
Njor, Sisse ;
Wu, Wendy Yi-Ying ;
Procopio, Pietro ;
Nickson, Carolyn ;
Lynge, Elsebeth .
INTERNATIONAL JOURNAL OF CANCER, 2021, 149 (04) :846-853
[4]  
Dembrower K, 2020, LANCET DIGIT HEALTH, V2, pE468, DOI 10.1016/S2589-7500(20)30185-0
[5]  
European Commission Initiative on Breast Cancer, SCREENING WOMEN AGED
[6]   Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy [J].
Freeman, Karoline ;
Geppert, Julia ;
Stinton, Chris ;
Todkill, Daniel ;
Johnson, Samantha ;
Clarke, Aileen ;
Taylor-Phillips, Sian .
BMJ-BRITISH MEDICAL JOURNAL, 2021, 374
[7]   Automation bias: a systematic review of frequency, effect mediators, and mitigators [J].
Goddard, Kate ;
Roudsari, Abdul ;
Wyatt, Jeremy C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (01) :121-127
[8]   Personalizing the treatment of women with early breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2013 [J].
Goldhirsch, A. ;
Winer, E. P. ;
Coates, A. S. ;
Gelber, R. D. ;
Piccart-Gebhart, M. ;
Thuerlimann, B. ;
Senn, H. -J. .
ANNALS OF ONCOLOGY, 2013, 24 (09) :2206-2223
[9]   Breast Cancer: Missed Interval and Screening-detected Cancer at Full-Field Digital Mammography and Screen-Film Mammography-Results from a Retrospective Review [J].
Hoff, Solveig R. ;
Abrahamsen, Anne-Line ;
Samset, Jon Helge ;
Vigeland, Einar ;
Klepp, Olbjorn ;
Hofvind, Solveig .
RADIOLOGY, 2012, 264 (02) :378-386
[10]   Influence of Mammography Volume on Radiologists' Performance: Results from BreastScreen Norway [J].
Hoff, Solveig Roth ;
Myklebust, Tor-Age ;
Lee, Christoph L. ;
Hofvind, Solveig .
RADIOLOGY, 2019, 292 (02) :289-296