Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density

被引:41
作者
Chang, Ken [1 ]
Beers, Andrew L. [1 ]
Brink, Laura [2 ]
Patel, Jay B. [1 ]
Singh, Praveer [1 ]
Arun, Nishanth T. [1 ]
Hoebel, Katharina V. [1 ]
Gaw, Nathan [1 ]
Shah, Meesam [2 ]
Pisano, Etta D. [3 ,4 ]
Tilkin, Mike [5 ]
Coombs, Laura P. [3 ]
Dreyer, Keith J. [6 ,7 ,8 ,9 ,10 ]
Allen, Bibb [10 ,11 ,12 ]
Agarwal, Sheela [13 ]
Kalpathy-Cramer, Jayashree [1 ,14 ,15 ,16 ,17 ]
机构
[1] Massachusetts Gen Hosp, Dept Radiol, Athinoula A Martinos Ctr Biomed Imaging, Boston, MA USA
[2] Amer Coll Radiol, Reston, VA USA
[3] ACR, Reston, VA USA
[4] Beth Israel Lahey Harvard Med Sch, Residence, Boston, MA USA
[5] ACR, Technol, Reston, VA USA
[6] MGH & BWH, Boston, MA USA
[7] MGH & BWH, Ctr Clin Data Sci, Boston, MA USA
[8] MGH & BWH, Radiol Informat, Boston, MA USA
[9] Harvard Med Sch, Radiol, Boston, MA 02115 USA
[10] ACR Data Sci Inst, Reston, VA USA
[11] Int Soc Radiol, Reston, VA USA
[12] Grandview Med Ctr, Birmingham, AL USA
[13] Lennox Hill Radiol, New York, NY USA
[14] Harvard Med Sch, CCDS, Boston, MA 02115 USA
[15] Harvard Med Sch, QTIM Lab, Boston, MA 02115 USA
[16] Harvard Med Sch, Ctr Machine Learning, Boston, MA 02115 USA
[17] Harvard Med Sch, Radiol, MGH, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
ACR AI-LAB; artificial intelligence; BI-RADS; breast density; deep learning; DMIST; generalizability; mammogram; neural networks; CANCER; MAMMOGRAPHY; RADIOLOGISTS; PERFORMANCE; IMPACT; RISK;
D O I
10.1016/j.jacr.2020.05.015
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objective: We developed deep learning algorithms to automatically assess BI-RADS breast density. Methods: Using a large multi-institution patient cohort of 108,230 digital screening mammograms from the Digital Mammographic Imaging Screening Trial, we investigated the effect of data, model, and training parameters on overall model performance and provided crowdsourcing evaluation from the attendees of the ACR 2019 Annual Meeting. Results: Our best-performing algorithm achieved good agreement with radiologists who were qualified interpreters of mammograms, with a four-class kappa of 0.667. When training was performed with randomly sampled images from the data set versus sampling equal number of images from each density category, the model predictions were biased away from the low-prevalence categories such as extremely dense breasts. The net result was an increase in sensitivity and a decrease in specificity for predicting dense breasts for equal class compared with random sampling. We also found that the performance of the model degrades when we evaluate on digital mammography data formats that differ from the one that we trained on, emphasizing the importance of multi-institutional training sets. Lastly, we showed that crowdsourced annotations, including those from attendees who routinely read mammograms, had higher agreement with our algorithm than with the original interpreting radiologists. Conclusion: We demonstrated the possible parameters that can influence the performance of the model and how crowdsourcing can be used for evaluation. This study was performed in tandem with the development of the ACR AI-LAB, a platform for democratizing artificial intelligence.
引用
收藏
页码:1653 / 1662
页数:10
相关论文
共 47 条
[1]   Deep learning for segmentation of brain tumors: Impact of cross-institutional training and testing [J].
AlBadawy, Ehab A. ;
Saha, Ashirbani ;
Mazurowski, Maciej A. .
MEDICAL PHYSICS, 2018, 45 (03) :1150-1158
[2]   DeepNeuro: an open-source deep learning toolbox for neuroimaging [J].
Beers, Andrew ;
Brown, James ;
Chang, Ken ;
Hoebel, Katharina ;
Patel, Jay ;
Ly, K. Ina ;
Tolaney, Sara M. ;
Brastianos, Priscilla ;
Rosen, Bruce ;
Gerstner, Elizabeth R. ;
Kalpathy-Cramer, Jayashree .
NEUROINFORMATICS, 2021, 19 (01) :127-140
[3]   QUANTITATIVE CLASSIFICATION OF MAMMOGRAPHIC DENSITIES AND BREAST-CANCER RISK - RESULTS FROM THE CANADIAN NATIONAL BREAST SCREENING STUDY [J].
BOYD, NF ;
BYNG, JW ;
JONG, RA ;
FISHELL, EK ;
LITTLE, LE ;
MILLER, AB ;
LOCKWOOD, GA ;
TRITCHLER, DL ;
YAFFE, MJ .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1995, 87 (09) :670-675
[4]   Comparison of Clinical and Automated Breast Density Measurements: Implications for Risk Prediction and Supplemental Screening [J].
Brandt, Kathleen R. ;
Scott, Christopher G. ;
Ma, Lin ;
Mahmoudzadeh, Amir P. ;
Jensen, Matthew R. ;
Whaley, Dana H. ;
Wu, Fang Fang ;
Malkov, Serghei ;
Hruska, Carrie B. ;
Norman, Aaron D. ;
Heine, John ;
Shepherd, John ;
Pankratz, V. Shane ;
Kerlikowske, Karla ;
Vachon, Celine M. .
RADIOLOGY, 2016, 279 (03) :710-719
[5]   Automated Diagnosis of Plus Disease in Retinopathy of Prematurity Using Deep Convolutional Neural Networks [J].
Brown, James M. ;
Campbell, J. Peter ;
Beers, Andrew ;
Chang, Ken ;
Ostmo, Susan ;
Chan, R. V. Paul ;
Dy, Jennifer ;
Erdogmus, Deniz ;
Ioannidis, Stratis ;
Kalpathy-Cramer, Jayashree ;
Chiang, Michael F. .
JAMA OPHTHALMOLOGY, 2018, 136 (07) :803-810
[6]   Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer [J].
Candido dos Reis, Francisco J. ;
Lynn, Stuart ;
Ali, H. Raza ;
Eccles, Diana ;
Hanby, Andrew ;
Provenzano, Elena ;
Caldas, Carlos ;
Howat, William J. ;
McDuffus, Leigh-Anne ;
Liu, Bin ;
Daley, Frances ;
Coulson, Penny ;
Vyas, Rupesh J. ;
Harris, Leslie M. ;
Owens, Joanna M. ;
Carton, Amy F. M. ;
McQuillan, Janette P. ;
Paterson, Andy M. ;
Hirji, Zohra ;
Christie, Sarah K. ;
Holmes, Amber R. ;
Schmidt, Marjanka K. ;
Garcia-Closas, Montserrat ;
Easton, Douglas F. ;
Bolla, Manjeet K. ;
Wang, Qin ;
Benitez, Javier ;
Milne, Roger L. ;
Mannermaa, Arto ;
Couch, Fergus ;
Devilee, Peter ;
Tollenaar, Robert A. E. M. ;
Seynaeve, Caroline ;
Cox, Angela ;
Cross, Simon S. ;
Blows, Fiona M. ;
Sanders, Joyce ;
de Groot, Renate ;
Figueroa, Jonine ;
Sherman, Mark ;
Hooning, Maartje ;
Brenner, Hermann ;
Holleczek, Bernd ;
Stegmaier, Christa ;
Lintott, Chris ;
Pharoah, Paul D. P. .
EBIOMEDICINE, 2015, 2 (07) :681-689
[7]   Automatic assessment of glioma burden: a deep learning algorithm for fully automated volumetric and bidimensional measurement [J].
Chang, Ken ;
Beers, Andrew L. ;
Bai, Harrison X. ;
Brown, James M. ;
Ly, K. Ina ;
Li, Xuejun ;
Senders, Joeky T. ;
Kavouridis, Vasileios K. ;
Boaro, Alessandro ;
Su, Chang ;
Bi, Wenya Linda ;
Rapalino, Otto ;
Liao, Weihua ;
Shen, Qin ;
Zhou, Hao ;
Xiao, Bo ;
Wang, Yinyan ;
Zhang, Paul J. ;
Pinho, Marco C. ;
Wen, Patrick Y. ;
Batchelor, Tracy T. ;
Boxerman, Jerrold L. ;
Arnaout, Omar ;
Rosen, Bruce R. ;
Gerstner, Elizabeth R. ;
Yang, Li ;
Huang, Raymond Y. ;
Kalpathy-Cramer, Jayashree .
NEURO-ONCOLOGY, 2019, 21 (11) :1412-1422
[8]   Distributed deep learning networks among institutions for medical imaging [J].
Chang, Ken ;
Balachandar, Niranjan ;
Lam, Carson ;
Yi, Darvin ;
Brown, James ;
Beers, Andrew ;
Rosen, Bruce ;
Rubin, Daniel L. ;
Kalpathy-Cramer, Jayashree .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (08) :945-954
[9]   A Neural Network Approach to Ordinal Regression [J].
Cheng, Jianlin ;
Wang, Zheng ;
Pollastri, Gianluca .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1279-1284
[10]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848