Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density

被引:41
作者
Chang, Ken [1 ]
Beers, Andrew L. [1 ]
Brink, Laura [2 ]
Patel, Jay B. [1 ]
Singh, Praveer [1 ]
Arun, Nishanth T. [1 ]
Hoebel, Katharina V. [1 ]
Gaw, Nathan [1 ]
Shah, Meesam [2 ]
Pisano, Etta D. [3 ,4 ]
Tilkin, Mike [5 ]
Coombs, Laura P. [3 ]
Dreyer, Keith J. [6 ,7 ,8 ,9 ,10 ]
Allen, Bibb [10 ,11 ,12 ]
Agarwal, Sheela [13 ]
Kalpathy-Cramer, Jayashree [1 ,14 ,15 ,16 ,17 ]
机构
[1] Massachusetts Gen Hosp, Dept Radiol, Athinoula A Martinos Ctr Biomed Imaging, Boston, MA USA
[2] Amer Coll Radiol, Reston, VA USA
[3] ACR, Reston, VA USA
[4] Beth Israel Lahey Harvard Med Sch, Residence, Boston, MA USA
[5] ACR, Technol, Reston, VA USA
[6] MGH & BWH, Boston, MA USA
[7] MGH & BWH, Ctr Clin Data Sci, Boston, MA USA
[8] MGH & BWH, Radiol Informat, Boston, MA USA
[9] Harvard Med Sch, Radiol, Boston, MA 02115 USA
[10] ACR Data Sci Inst, Reston, VA USA
[11] Int Soc Radiol, Reston, VA USA
[12] Grandview Med Ctr, Birmingham, AL USA
[13] Lennox Hill Radiol, New York, NY USA
[14] Harvard Med Sch, CCDS, Boston, MA 02115 USA
[15] Harvard Med Sch, QTIM Lab, Boston, MA 02115 USA
[16] Harvard Med Sch, Ctr Machine Learning, Boston, MA 02115 USA
[17] Harvard Med Sch, Radiol, MGH, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
ACR AI-LAB; artificial intelligence; BI-RADS; breast density; deep learning; DMIST; generalizability; mammogram; neural networks; CANCER; MAMMOGRAPHY; RADIOLOGISTS; PERFORMANCE; IMPACT; RISK;
D O I
10.1016/j.jacr.2020.05.015
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objective: We developed deep learning algorithms to automatically assess BI-RADS breast density. Methods: Using a large multi-institution patient cohort of 108,230 digital screening mammograms from the Digital Mammographic Imaging Screening Trial, we investigated the effect of data, model, and training parameters on overall model performance and provided crowdsourcing evaluation from the attendees of the ACR 2019 Annual Meeting. Results: Our best-performing algorithm achieved good agreement with radiologists who were qualified interpreters of mammograms, with a four-class kappa of 0.667. When training was performed with randomly sampled images from the data set versus sampling equal number of images from each density category, the model predictions were biased away from the low-prevalence categories such as extremely dense breasts. The net result was an increase in sensitivity and a decrease in specificity for predicting dense breasts for equal class compared with random sampling. We also found that the performance of the model degrades when we evaluate on digital mammography data formats that differ from the one that we trained on, emphasizing the importance of multi-institutional training sets. Lastly, we showed that crowdsourced annotations, including those from attendees who routinely read mammograms, had higher agreement with our algorithm than with the original interpreting radiologists. Conclusion: We demonstrated the possible parameters that can influence the performance of the model and how crowdsourcing can be used for evaluation. This study was performed in tandem with the development of the ACR AI-LAB, a platform for democratizing artificial intelligence.
引用
收藏
页码:1653 / 1662
页数:10
相关论文
共 47 条
  • [1] Deep learning for segmentation of brain tumors: Impact of cross-institutional training and testing
    AlBadawy, Ehab A.
    Saha, Ashirbani
    Mazurowski, Maciej A.
    [J]. MEDICAL PHYSICS, 2018, 45 (03) : 1150 - 1158
  • [2] DeepNeuro: an open-source deep learning toolbox for neuroimaging
    Beers, Andrew
    Brown, James
    Chang, Ken
    Hoebel, Katharina
    Patel, Jay
    Ly, K. Ina
    Tolaney, Sara M.
    Brastianos, Priscilla
    Rosen, Bruce
    Gerstner, Elizabeth R.
    Kalpathy-Cramer, Jayashree
    [J]. NEUROINFORMATICS, 2021, 19 (01) : 127 - 140
  • [3] QUANTITATIVE CLASSIFICATION OF MAMMOGRAPHIC DENSITIES AND BREAST-CANCER RISK - RESULTS FROM THE CANADIAN NATIONAL BREAST SCREENING STUDY
    BOYD, NF
    BYNG, JW
    JONG, RA
    FISHELL, EK
    LITTLE, LE
    MILLER, AB
    LOCKWOOD, GA
    TRITCHLER, DL
    YAFFE, MJ
    [J]. JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1995, 87 (09) : 670 - 675
  • [4] Comparison of Clinical and Automated Breast Density Measurements: Implications for Risk Prediction and Supplemental Screening
    Brandt, Kathleen R.
    Scott, Christopher G.
    Ma, Lin
    Mahmoudzadeh, Amir P.
    Jensen, Matthew R.
    Whaley, Dana H.
    Wu, Fang Fang
    Malkov, Serghei
    Hruska, Carrie B.
    Norman, Aaron D.
    Heine, John
    Shepherd, John
    Pankratz, V. Shane
    Kerlikowske, Karla
    Vachon, Celine M.
    [J]. RADIOLOGY, 2016, 279 (03) : 710 - 719
  • [5] Automated Diagnosis of Plus Disease in Retinopathy of Prematurity Using Deep Convolutional Neural Networks
    Brown, James M.
    Campbell, J. Peter
    Beers, Andrew
    Chang, Ken
    Ostmo, Susan
    Chan, R. V. Paul
    Dy, Jennifer
    Erdogmus, Deniz
    Ioannidis, Stratis
    Kalpathy-Cramer, Jayashree
    Chiang, Michael F.
    [J]. JAMA OPHTHALMOLOGY, 2018, 136 (07) : 803 - 810
  • [6] Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer
    Candido dos Reis, Francisco J.
    Lynn, Stuart
    Ali, H. Raza
    Eccles, Diana
    Hanby, Andrew
    Provenzano, Elena
    Caldas, Carlos
    Howat, William J.
    McDuffus, Leigh-Anne
    Liu, Bin
    Daley, Frances
    Coulson, Penny
    Vyas, Rupesh J.
    Harris, Leslie M.
    Owens, Joanna M.
    Carton, Amy F. M.
    McQuillan, Janette P.
    Paterson, Andy M.
    Hirji, Zohra
    Christie, Sarah K.
    Holmes, Amber R.
    Schmidt, Marjanka K.
    Garcia-Closas, Montserrat
    Easton, Douglas F.
    Bolla, Manjeet K.
    Wang, Qin
    Benitez, Javier
    Milne, Roger L.
    Mannermaa, Arto
    Couch, Fergus
    Devilee, Peter
    Tollenaar, Robert A. E. M.
    Seynaeve, Caroline
    Cox, Angela
    Cross, Simon S.
    Blows, Fiona M.
    Sanders, Joyce
    de Groot, Renate
    Figueroa, Jonine
    Sherman, Mark
    Hooning, Maartje
    Brenner, Hermann
    Holleczek, Bernd
    Stegmaier, Christa
    Lintott, Chris
    Pharoah, Paul D. P.
    [J]. EBIOMEDICINE, 2015, 2 (07): : 681 - 689
  • [7] Automatic assessment of glioma burden: a deep learning algorithm for fully automated volumetric and bidimensional measurement
    Chang, Ken
    Beers, Andrew L.
    Bai, Harrison X.
    Brown, James M.
    Ly, K. Ina
    Li, Xuejun
    Senders, Joeky T.
    Kavouridis, Vasileios K.
    Boaro, Alessandro
    Su, Chang
    Bi, Wenya Linda
    Rapalino, Otto
    Liao, Weihua
    Shen, Qin
    Zhou, Hao
    Xiao, Bo
    Wang, Yinyan
    Zhang, Paul J.
    Pinho, Marco C.
    Wen, Patrick Y.
    Batchelor, Tracy T.
    Boxerman, Jerrold L.
    Arnaout, Omar
    Rosen, Bruce R.
    Gerstner, Elizabeth R.
    Yang, Li
    Huang, Raymond Y.
    Kalpathy-Cramer, Jayashree
    [J]. NEURO-ONCOLOGY, 2019, 21 (11) : 1412 - 1422
  • [8] Distributed deep learning networks among institutions for medical imaging
    Chang, Ken
    Balachandar, Niranjan
    Lam, Carson
    Yi, Darvin
    Brown, James
    Beers, Andrew
    Rosen, Bruce
    Rubin, Daniel L.
    Kalpathy-Cramer, Jayashree
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (08) : 945 - 954
  • [9] A Neural Network Approach to Ordinal Regression
    Cheng, Jianlin
    Wang, Zheng
    Pollastri, Gianluca
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1279 - 1284
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848