Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction

被引:25
作者
Olsson, Henrik [1 ]
Kartasalo, Kimmo [1 ]
Mulliqi, Nita [1 ]
Capuccini, Marco [2 ]
Ruusuvuori, Pekka [3 ,4 ]
Samaratunga, Hemamali [5 ,6 ]
Delahunt, Brett [7 ]
Lindskog, Cecilia [8 ]
Janssen, Emiel A. M. [9 ,10 ]
Blilie, Anders [9 ,10 ]
Egevad, Lars [11 ]
Spjuth, Ola [2 ]
Eklund, Martin [1 ]
机构
[1] Karolinska Inst, Dept Med Epidemiol & Biostat, Stockholm, Sweden
[2] Uppsala Univ, Dept Pharmaceut Biosci, Uppsala, Sweden
[3] Univ Turku, Inst Biomed, Turku, Finland
[4] Tampere Univ, Fac Med & Hlth Technol, Tampere, Finland
[5] Aquesta Uropathol, Brisbane, Qld, Australia
[6] Univ Queensland, Brisbane, Qld, Australia
[7] Univ Otago, Wellington Sch Med & Hlth Sci, Dept Pathol & Mol Med, Wellington, New Zealand
[8] Uppsala Univ, Dept Immunol Genet & Pathol, Uppsala, Sweden
[9] Stavanger Univ Hosp, Dept Pathol, Stavanger, Norway
[10] Univ Stavanger, Fac Sci & Technol, Stavanger, Norway
[11] Karolinska Inst, Dept Oncol Pathol, Solna, Sweden
基金
瑞典研究理事会;
关键词
PROSTATE-CANCER; BIOPSIES;
D O I
10.1038/s41467-022-34945-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Unreliable predictions can occur when an artificial intelligence (AI) system is presented with data it has not been exposed to during training. We demonstrate the use of conformal prediction to detect unreliable predictions, using histopathological diagnosis and grading of prostate biopsies as example. We digitized 7788 prostate biopsies from 1192 men in the STHLM3 diagnostic study, used for training, and 3059 biopsies from 676 men used for testing. With conformal prediction, 1 in 794 (0.1%) predictions is incorrect for cancer diagnosis (compared to 14 errors [2%] without conformal prediction) while 175 (22%) of the predictions are flagged as unreliable when the AI-system is presented with new data from the same lab and scanner that it was trained on. Conformal prediction could with small samples (N = 49 for external scanner, N = 10 for external lab and scanner, and N = 12 for external lab, scanner and pathology assessment) detect systematic differences in external data leading to worse predictive performance. The AI-system with conformal prediction commits 3 (2%) errors for cancer detection in cases of atypical prostate tissue compared to 44 (25%) without conformal prediction, while the system flags 143 (80%) unreliable predictions. We conclude that conformal prediction can increase patient safety of AI-systems.
引用
收藏
页数:10
相关论文
共 29 条
  • [1] Predicting With Confidence: Using Conformal Prediction in Drug Discovery
    Alvarsson, Jonathan
    McShane, Staffan Arvidsson
    Norinder, Ulf
    Spjuth, Ola
    [J]. JOURNAL OF PHARMACEUTICAL SCIENCES, 2021, 110 (01) : 42 - 49
  • [3] Histopathologic False-positive Diagnoses of Prostate Cancer in the Age of Immunohistochemistry
    Beltran, Luis
    Ahmad, Amar S.
    Sandu, Holly
    Kudahetti, Sakunthala
    Soosay, Geraldine
    Moller, Henrik
    Cuzick, Jack
    Berney, Daniel M.
    [J]. AMERICAN JOURNAL OF SURGICAL PATHOLOGY, 2019, 43 (03) : 361 - 368
  • [4] Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge
    Bulten, Wouter
    Kartasalo, Kimmo
    Chen, Po-Hsuan Cameron
    Strom, Peter
    Pinckaers, Hans
    Nagpal, Kunal
    Cai, Yuannan
    Steiner, David F.
    van Boven, Hester
    Vink, Robert
    Hulsbergen-van de Kaa, Christina
    van der Laak, Jeroen
    Amin, Mahul B.
    Evans, Andrew J.
    van der Kwast, Theodorus
    Allan, Robert
    Humphrey, Peter A.
    Gronberg, Henrik
    Samaratunga, Hemamali
    Delahunt, Brett
    Tsuzuki, Toyonori
    Hakkinen, Tomi
    Egevad, Lars
    Demkin, Maggie
    Dane, Sohier
    Tan, Fraser
    Valkonen, Masi
    Corrado, Greg S.
    Peng, Lily
    Mermel, Craig H.
    Ruusuvuori, Pekka
    Litjens, Geert
    Eklund, Martin
    [J]. NATURE MEDICINE, 2022, 28 (01) : 154 - +
  • [5] Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study
    Bulten, Wouter
    Pinckaers, Hans
    van Boven, Hester
    Vink, Robert
    de Bel, Thomas
    van Ginneken, Bram
    van der Laak, Jeroen
    Hulsbergen-van de Kaa, Christina
    Litjens, Geert
    [J]. LANCET ONCOLOGY, 2020, 21 (02) : 233 - 241
  • [6] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [7] Collins GS, 2015, ANN INTERN MED, V162, P55, DOI [10.7326/M14-0697, 10.1136/bmj.g7594, 10.1016/j.jclinepi.2014.11.010, 10.1038/bjc.2014.639, 10.1002/bjs.9736, 10.1016/j.eururo.2014.11.025, 10.1186/s12916-014-0241-z, 10.7326/M14-0698]
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] Utility of Pathology Imagebase for standardisation of prostate cancer grading
    Egevad, Lars
    Delahunt, Brett
    Berney, Daniel M.
    Bostwick, David G.
    Cheville, John
    Comperat, Eva
    Evans, Andrew J.
    Fine, Samson W.
    Grignon, David J.
    Humphrey, Peter A.
    Hornblad, Jonas
    Iczkowski, Kenneth A.
    Kench, James G.
    Kristiansen, Glen
    Leite, Katia R. M.
    Magi-Galluzzi, Cristina
    McKenney, Jesse K.
    Oxley, Jon
    Pan, Chin-Chen
    Samaratunga, Hemamali
    Srigley, John R.
    Takahashi, Hiroyuki
    True, Lawrence D.
    Tsuzuki, Toyonori
    van der Kwast, Theo
    Varma, Murali
    Zhou, Ming
    Clements, Mark
    [J]. HISTOPATHOLOGY, 2018, 73 (01) : 8 - 18
  • [10] Standardization of Gleason grading among 337 European pathologists
    Egevad, Lars
    Ahmad, Amar S.
    Algaba, Ferran
    Berney, Daniel M.
    Boccon-Gibod, Liliane
    Comperat, Eva
    Evans, Andrew J.
    Griffiths, David
    Grobholz, Rainer
    Kristiansen, Glen
    Langner, Cord
    Lopez-Beltran, Antonio
    Montironi, Rodolfo
    Moss, Sue
    Oliveira, Pedro
    Vainer, Ben
    Varma, Murali
    Camparo, Philippe
    [J]. HISTOPATHOLOGY, 2013, 62 (02) : 247 - 256