Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study

被引:413
作者
Bulten, Wouter [1 ]
Pinckaers, Hans [1 ]
van Boven, Hester [3 ]
Vink, Robert [4 ]
de Bel, Thomas [1 ]
van Ginneken, Bram [2 ]
van der Laak, Jeroen [1 ]
Hulsbergen-van de Kaa, Christina [4 ]
Litjens, Geert [1 ]
机构
[1] Radboud Univ Nijmegen, Med Ctr, Radboud Inst Hlth Sci, Dept Pathol, Nijmegen, Netherlands
[2] Radboud Univ Nijmegen, Med Ctr, Radboud Inst Hlth Sci, Dept Radiol & Nucl Med, Nijmegen, Netherlands
[3] Antoni van Leeuwenhoek Hosp, Netherlands Canc Inst, Dept Pathol, Amsterdam, Netherlands
[4] Lab Pathol East Netherlands, Hengelo, Netherlands
关键词
ISUP CONSENSUS CONFERENCE; INTEROBSERVER REPRODUCIBILITY; INTERNATIONAL-SOCIETY; CARCINOMA;
D O I
10.1016/S1470-2045(19)30739-9
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background The Gleason score is the strongest correlating predictor of recurrence for prostate cancer, but has substantial inter-observer variability, limiting its usefulness for individual patients. Specialised urological pathologists have greater concordance; however, such expertise is not widely available. Prostate cancer diagnostics could thus benefit from robust, reproducible Gleason grading. We aimed to investigate the potential of deep learning to perform automated Gleason grading of prostate biopsies. Methods In this retrospective study, we developed a deep-learning system to grade prostate biopsies following the Gleason grading standard. The system was developed using randomly selected biopsies, sampled by the biopsy Gleason score, from patients at the Radboud University Medical Center (pathology report dated between Jan 1, 2012, and Dec 31, 2017). A semi-automatic labelling technique was used to circumvent the need for manual annotations by pathologists, using pathologists' reports as the reference standard during training. The system was developed to delineate individual glands, assign Gleason growth patterns, and determine the biopsy-level grade. For validation of the method, a consensus reference standard was set by three expert urological pathologists on an independent test set of 550 biopsies. Of these 550, 100 were used in an observer experiment, in which the system, 13 pathologists, and two pathologists in training were compared with respect to the reference standard. The system was also compared to an external test dataset of 886 cores, which contained 245 cores from a different centre that were independently graded by two pathologists. Findings We collected 5759 biopsies from 1243 patients. The developed system achieved a high agreement with the reference standard (quadratic Cohen's kappa 0.918, 95% CI 0.891-0.941) and scored highly at clinical decision thresholds: benign versus malignant (area under the curve 0.990, 95% CI 0.982-0.996), grade group of 2 or more (0.978, 0.966-0.988), and grade group of 3 or more (0.974, 0.962-0.984). In an observer experiment, the deep-learning system scored higher (kappa 0.854) than the panel (median kappa 0.819), outperforming 10 of 15 pathologist observers. On the external test dataset, the system obtained a high agreement with the reference standard set independently by two pathologists (quadratic Cohen's kappa 0.723 and 0.707) and within inter-observer variability (kappa 0.71). Interpretation Our automated deep-learning system achieved a performance similar to pathologists for Gleason grading and could potentially contribute to prostate cancer diagnosis. The system could potentially assist pathologists by screening biopsies, providing second opinions on grade group, and presenting quantitative measurements of volume percentages. Copyright (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:233 / 241
页数:9
相关论文
共 32 条
  • [21] A survey on deep learning in medical image analysis
    Litjens, Geert
    Kooi, Thijs
    Bejnordi, Babak Ehteshami
    Setio, Arnaud Arindra Adiyoso
    Ciompi, Francesco
    Ghafoorian, Mohsen
    van der Laak, Jeroen A. W. M.
    van Ginneken, Bram
    Sanchez, Clara I.
    [J]. MEDICAL IMAGE ANALYSIS, 2017, 42 : 60 - 88
  • [22] Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis
    Litjens, Geert
    Sanchez, Clara I.
    Timofeeva, Nadya
    Hermsen, Meyke
    Nagtegaal, Iris
    Kovacs, Iringo
    Hulsbergen-van de Kaa, Christina
    Bult, Peter
    van Ginneken, Bram
    van der Laak, Jeroen
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [23] Deep learning for automatic Gleason pattern classification for grade group determination of prostate biopsies
    Lucas, Marit
    Jansen, Ilaria
    Savci-Heijink, C. Dilara
    Meijer, Sybren L.
    de Boer, Onno J.
    van Leeuwen, Ton G.
    de Bruin, Daniel M.
    Marquering, Henk A.
    [J]. VIRCHOWS ARCHIV, 2019, 475 (01) : 77 - 83
  • [24] Whole Slide Imaging Versus Microscopy for Primary Diagnosis in Surgical Pathology A Multicenter Blinded Randomized Noninferiority Study of 1992 Cases (Pivotal Study)
    Mukhopadhyay, Sanjay
    Feldman, Michael D.
    Abels, Esther
    Ashfaq, Raheela
    Beltaifa, Senda
    Cacciabeve, Nicolas G.
    Cathro, Helen P.
    Cheng, Liang
    Cooper, Kumarasen
    Dickey, Glenn E.
    Gill, Ryan M.
    Heaton, Robert P., Jr.
    Kerstens, Rene
    Lindberg, Guy M.
    Malhotra, Reenu K.
    Mandell, James W.
    Manlucu, Ellen D.
    Mills, Anne M.
    Mills, Stacey E.
    Moskaluk, Christopher A.
    Nelis, Mischa
    Patil, Deepa T.
    Przybycin, Christopher G.
    Reynolds, Jordan P.
    Rubin, Brian P.
    Saboorian, Mohammad H.
    Salicru, Mauricio
    Samols, Mark A.
    Sturgis, Charles D.
    Turner, Kevin O.
    Wick, Mark R.
    Yoon, Ji Y.
    Zhao, Po
    Taylor, Clive R.
    [J]. AMERICAN JOURNAL OF SURGICAL PATHOLOGY, 2018, 42 (01) : 39 - 52
  • [25] Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer
    Nagpal, Kunal
    Foote, Davis
    Liu, Yun
    Chen, Po-Hsuan Cameron
    Wulczyn, Ellery
    Tan, Fraser
    Olson, Niels
    Smith, Jenny L.
    Mohtashamian, Arash
    Wren, James H.
    Corrado, Greg S.
    MacDonald, Robert
    Peng, Lily H.
    Amin, Mahul B.
    Evans, Andrew J.
    Sangoi, Ankur R.
    Mermel, Craig H.
    Hipp, Jason D.
    Stumpe, Martin C.
    [J]. NPJ DIGITAL MEDICINE, 2019, 2 (1)
  • [26] Naik S., 2007, MIAAB workshop, P1
  • [27] Automatic Gleason grading of prostate cancer using quantitative phase imaging and machine learning
    Nguyen, Tan H.
    Sridharan, Shamira
    Macias, Virgilia
    Kajdacsy-Balla, Andre
    Melamed, Jonathan
    Do, Minh N.
    Popescu, Gabriel
    [J]. JOURNAL OF BIOMEDICAL OPTICS, 2017, 22 (03)
  • [28] Interobserver variability in Gleason histological grading of prostate cancer
    Ozkan, Tayyar A.
    Eruyar, Ahmet T.
    Cebeci, Oguz O.
    Memik, Omur
    Ozcan, Levent
    Kuskonmaz, Ibrahim
    [J]. SCANDINAVIAN JOURNAL OF UROLOGY, 2016, 50 (06) : 420 - 424
  • [29] Rolnick D., 2018, Deep learning is robust to massive label noise
  • [30] U-Net: Convolutional Networks for Biomedical Image Segmentation
    Ronneberger, Olaf
    Fischer, Philipp
    Brox, Thomas
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 234 - 241