Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review

被引:24
作者
Anderson, Anna W. [1 ]
Marinovich, M. Luke [2 ]
Houssami, Nehmat [3 ,4 ]
Lowry, Kathryn P. [1 ]
Elmore, Joann G. [5 ,6 ]
Buist, Diana S. M. [7 ]
Hofvind, Solveig [7 ,8 ]
Lee, Christoph I. [1 ,9 ]
机构
[1] Univ Washington, Sch Med, Dept Radiol, Seattle, WA 98195 USA
[2] Curtin Univ, Curtin Sch Populat Hlth, Bentley, WA, Australia
[3] Univ Sydney, Joint Venture Canc Council NSW, Daffodil Ctr, Sydney, NSW, Australia
[4] Univ Sydney, NBCF Chair Breast Canc Prevent, Sydney, NSW, Australia
[5] Univ Calif Los Angeles, David Geffen Sch Med, Los Angeles, CA USA
[6] Univ Calif Los Angeles, UCLAs Natl Clinician Scholars Program, Los Angeles, CA USA
[7] Kaiser Permanente Washington Hlth Res Inst, Seattle, WA USA
[8] Canc Registry Norway, Sect Head Breast Canc Screening, Oslo, Norway
[9] Univ Washington, Northwest Screening & Canc Outcomes Res Enterpris, Seattle, WA 98195 USA
关键词
DIAGNOSTIC-TEST ACCURACY; BREAST-CANCER; IMAGE-ANALYSIS; METAANALYSIS; AI;
D O I
10.1016/j.jacr.2021.11.008
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose: The aim of this study was to describe the current state of science regarding independent external validation of artificial intelligence (AI) technologies for screening mammography. Methods: A systematic review was performed across five databases (Embase, PubMed, IEEE Explore, Engineer Village, and arXiv) through December 10, 2020. Studies that used screening examinations from real-world settings to externally validate AI algorithms for mammographic cancer detection were included. The main outcome was diagnostic accuracy, defined by area under the receiver operating characteristic curve (AUC). Performance was also compared between radiologists and either stand-alone AI or combined radiologist and AI interpretation. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. Results: After data extraction, 13 studies met the inclusion criteria (148,361 total patients). Most studies (77% [n = 10]) evaluated commercially available AI algorithms. Studies included retrospective reader studies (46% [n = 6]), retrospective simulation studies (38% [n = 5]), or both (15% [n = 2]). Across 5 studies comparing stand-alone AI with radiologists, 60% (n = 3) demonstrated improved accuracy with AI (AUC improvement range, 0.02-0.13). All 5 studies comparing combined radiologist and AI interpretation with radi-ologists alone demonstrated improved accuracy with AI (AUC improvement range, 0.028-0.115). Most studies had risk for bias or applicability concerns for patient selection (69% [n = 9]) and the reference standard (69% [n = 9]). Only two studies obtained ground-truth cancer outcomes through regional cancer registry linkage. Conclusions: To date, external validation efforts for AI screening mammographic technologies suggest small potential diagnostic accuracy improvements but have been retrospective in nature and suffer from risk for bias and applicability concerns. Copyright (C) 2021 American College of Radiology
引用
收藏
页码:259 / 273
页数:15
相关论文
共 30 条
[11]   Artificial intelligence for breast cancer screening: Opportunity or hype? [J].
Houssami, Nehmat ;
Lee, Christoph I. ;
Buist, Diana S. M. ;
Tao, Dacheng .
BREAST, 2017, 36 :31-33
[12]   Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study [J].
Kim, Hyo-Eun ;
Kim, Hak Hee ;
Han, Boo-Kyung ;
Kim, Ki Hwan ;
Han, Kyunghwa ;
Nam, Hyeonseob ;
Lee, Eun Hye ;
Kim, Eun-Kyung .
LANCET DIGITAL HEALTH, 2020, 2 (03) :E138-E148
[13]   Pathways to breast cancer screening arti ficial intelligence algorithm validation [J].
Lee, Christoph I. ;
Houssami, Nehmat ;
Elmore, Joann G. ;
Buist, Diana S. M. .
BREAST, 2020, 52 :146-149
[14]  
Lotter W., Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach
[15]   Artificial Intelligence in Health Care A Report From the National Academy of Medicine [J].
Matheny, Michael E. ;
Whicher, Danielle ;
Thadaney Israni, Sonoo .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2020, 323 (06) :509-510
[16]   Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies The PRISMA-DTA Statement [J].
McInnes, Matthew D. F. ;
Moher, David ;
Thombs, Brett D. ;
McGrath, Trevor A. ;
Bossuyt, Patrick M. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 319 (04) :388-396
[17]   International evaluation of an AI system for breast cancer screening [J].
McKinney, Scott Mayer ;
Sieniek, Marcin ;
Godbole, Varun ;
Godwin, Jonathan ;
Antropova, Natasha ;
Ashrafian, Hutan ;
Back, Trevor ;
Chesus, Mary ;
Corrado, Greg C. ;
Darzi, Ara ;
Etemadi, Mozziyar ;
Garcia-Vicente, Florencia ;
Gilbert, Fiona J. ;
Halling-Brown, Mark ;
Hassabis, Demis ;
Jansen, Sunny ;
Karthikesalingam, Alan ;
Kelly, Christopher J. ;
King, Dominic ;
Ledsam, Joseph R. ;
Melnick, David ;
Mostofi, Hormuz ;
Peng, Lily ;
Reicher, Joshua Jay ;
Romera-Paredes, Bernardino ;
Sidebottom, Richard ;
Suleyman, Mustafa ;
Tse, Daniel ;
Young, Kenneth C. ;
De Fauw, Jeffrey ;
Shetty, Shravya .
NATURE, 2020, 577 (7788) :89-+
[18]   Improving Breast Cancer Detection Accuracy of Mammography with the Concurrent Use of an Artificial Intelligence Tool [J].
Pacile, Serena ;
Lopez, January ;
Chone, Pauline ;
Bertinotti, Thomas ;
Grouin, Jean Marie ;
Fillard, Pierre .
RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2020, 2 (06) :1-9
[19]   Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction [J].
Park, Seong Ho ;
Han, Kyunghwa .
RADIOLOGY, 2018, 286 (03) :800-809
[20]  
Rodriguez-Ruiz A, CAN RADIOLOGISTS IMP, DOI [10.1117/12.2317, DOI 10.1117/12.2317]