Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature

被引:2
作者
Bhalla, Deeksha [1 ]
Rangarajan, Krithika [1 ,3 ]
Chandra, Tany [1 ]
Banerjee, Subhashis [2 ]
Arora, Chetan [2 ]
机构
[1] All India Inst Med Sci, Dept Radiodiag, New Delhi, India
[2] Indian Inst Technol, Dept Comp Sci & Engn, New Delhi, India
[3] All India Inst Med Sci, Room 47A IRCH, New Delhi 10029, India
关键词
artificial intelligence; breast cancer; deep learning; mammography; neural networks; systematic review; COMPUTER-AIDED DETECTION; CONVOLUTIONAL NEURAL-NETWORKS; BREAST-CANCER DIAGNOSIS; SCREENING MAMMOGRAPHY; DIGITAL MAMMOGRAMS; MISSED CANCERS; CLASSIFICATION; SENSITIVITY; ACCURACY; SCHEME;
D O I
10.1055/s-0043-1775737
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background Although abundant literature is currently available on the use of deep learning for breast cancer detection in mammography, the quality of such literature is widely variable.Purpose To evaluate published literature on breast cancer detection in mammography for reproducibility and to ascertain best practices for model design.Methods The PubMed and Scopus databases were searched to identify records that described the use of deep learning to detect lesions or classify images into cancer or noncancer. A modification of Quality Assessment of Diagnostic Accuracy Studies (mQUADAS-2) tool was developed for this review and was applied to the included studies. Results of reported studies (area under curve [AUC] of receiver operator curve [ROC] curve, sensitivity, specificity) were recorded.Results A total of 12,123 records were screened, of which 107 fit the inclusion criteria. Training and test datasets, key idea behind model architecture, and results were recorded for these studies. Based on mQUADAS-2 assessment, 103 studies had high risk of bias due to nonrepresentative patient selection. Four studies were of adequate quality, of which three trained their own model, and one used a commercial network. Ensemble models were used in two of these. Common strategies used for model training included patch classifiers, image classification networks (ResNet in 67%), and object detection networks (RetinaNet in 67%). The highest reported AUC was 0.927 +/- 0.008 on a screening dataset, while it reached 0.945 (0.919-0.968) on an enriched subset. Higher values of AUC (0.955) and specificity (98.5%) were reached when combined radiologist and Artificial Intelligence readings were used than either of them alone. None of the studies provided explainability beyond localization accuracy. None of the studies have studied interaction between AI and radiologist in a real world setting.Conclusion While deep learning holds much promise in mammography interpretation, evaluation in a reproducible clinical setting and explainable networks are the need of the hour.
引用
收藏
页码:469 / 487
页数:19
相关论文
共 120 条
[61]   Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach [J].
Lotter, William ;
Diab, Abdul Rahman ;
Haslam, Bryan ;
Kim, Jiye G. ;
Grisot, Giorgia ;
Wu, Eric ;
Wu, Kevin ;
Onieva, Jorge Onieva ;
Boyer, Yun ;
Boxerman, Jerrold L. ;
Wang, Meiyun ;
Bandler, Mack ;
Vijayaraghavan, Gopal R. ;
Gregory Sorensen, A. .
NATURE MEDICINE, 2021, 27 (02) :244-+
[62]   Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies The PRISMA-DTA Statement [J].
McInnes, Matthew D. F. ;
Moher, David ;
Thombs, Brett D. ;
McGrath, Trevor A. ;
Bossuyt, Patrick M. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 319 (04) :388-396
[63]   International evaluation of an AI system for breast cancer screening [J].
McKinney, Scott Mayer ;
Sieniek, Marcin ;
Godbole, Varun ;
Godwin, Jonathan ;
Antropova, Natasha ;
Ashrafian, Hutan ;
Back, Trevor ;
Chesus, Mary ;
Corrado, Greg C. ;
Darzi, Ara ;
Etemadi, Mozziyar ;
Garcia-Vicente, Florencia ;
Gilbert, Fiona J. ;
Halling-Brown, Mark ;
Hassabis, Demis ;
Jansen, Sunny ;
Karthikesalingam, Alan ;
Kelly, Christopher J. ;
King, Dominic ;
Ledsam, Joseph R. ;
Melnick, David ;
Mostofi, Hormuz ;
Peng, Lily ;
Reicher, Joshua Jay ;
Romera-Paredes, Bernardino ;
Sidebottom, Richard ;
Suleyman, Mustafa ;
Tse, Daniel ;
Young, Kenneth C. ;
De Fauw, Jeffrey ;
Shetty, Shravya .
NATURE, 2020, 577 (7788) :89-+
[64]  
Mednikov Y, 2018, IEEE ENG MED BIO, P2587, DOI 10.1109/EMBC.2018.8512750
[65]   A Hybridized ELM for Automatic Micro Calcification Detection in Mammogram Images Based on Multi-Scale Features [J].
Melekoodappattu, Jayesh George ;
Subbian, Perumal Sankar .
JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (07)
[66]   A new triplet convolutional neural network for classification of lesions on mammograms [J].
Merati M. ;
Mahmoudi S. ;
Chenine A. ;
Chikh M.A. .
Revue d'Intelligence Artificielle, 2019, 33 (03) :213-217
[67]   Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers [J].
Mongan, John ;
Moy, Linda ;
Kahn, Charles E., Jr. .
RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2020, 2 (02)
[68]   Improving breast mass classification by shared data with domain transformation using a generative adversarial network [J].
Muramatsu, Chisako ;
Nishio, Mizuho ;
Goto, Takuma ;
Oiwa, Mikinao ;
Morita, Takako ;
Yakami, Masahiro ;
Kubo, Takeshi ;
Togashi, Kaori ;
Fujita, Hiroshi .
COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 119
[69]   Multi-scale attention-based convolutional neural network for classification of breast masses in mammograms [J].
Niu, Jing ;
Li, Hua ;
Zhang, Chen ;
Li, Dengao .
MEDICAL PHYSICS, 2021, 48 (07) :3878-3892
[70]   The Systematic Review of Artificial Intelligence Applications in Breast Cancer Diagnosis [J].
Ozsahin, Dilber Uzun ;
Emegano, Declan Ikechukwu ;
Uzun, Berna ;
Ozsahin, Ilker .
DIAGNOSTICS, 2023, 13 (01)