Experts vs. machine - comparison of machine learning to expert-informed prediction of outcome after major liver surgery

被引:0
作者
Staiger, Roxane D. [1 ,6 ]
Mehra, Tarun [2 ]
Haile, Sarah R. [3 ]
Domenghino, Anja [1 ]
Kuemmerli, Christoph [5 ]
Abbassi, Fariba [1 ]
Kozbur, Damian [4 ]
Dutkowski, Philipp [1 ]
Puhan, Milo A. [3 ]
Clavien, Pierre -Alain [1 ]
机构
[1] Univ Hosp Zurich, Dept Surg & Transplantat, Zurich, Switzerland
[2] Univ Hosp Zurich, Dept Med Oncol & Hematol, Zurich, Switzerland
[3] Univ Zurich, Epidemiol Biostat & Prevent Inst, Dept Epidemiol, Zurich, Switzerland
[4] Univ Zurich, Dept Econ, Zurich, Switzerland
[5] Clarunis Univ Hosp, Dept Surg, Basel, Switzerland
[6] Cantonal Hosp Lucerne, Dept Surg, Spitalstr, CH-6000 Luzern, Switzerland
关键词
ARTIFICIAL-INTELLIGENCE; SURGICAL COMPLICATIONS; DECISION-SUPPORT; RISK; CLASSIFICATION; MORTALITY; SYSTEM; FUTURE;
D O I
10.1016/j.hpb.2024.02.006
中图分类号
R57 [消化系及腹部疾病];
学科分类号
摘要
Background: Machine learning (ML) has been successfully implemented for classification tasks (e.g., cancer diagnosis). ML performance for more challenging predictions is largely unexplored. This study's objective was to compare machine learning vs. expert-informed predictions for surgical outcome in patients undergoing major liver surgery. Methods: Single tertiary center data on preoperative parameters and postoperative complications for elective hepatic surgery patients were included (2008-2021). Expert-informed prediction models were established on 14 parameters identified by two expert liver surgeons to impact on postoperative outcome. ML models used all available preoperative patient variables (n = 62). Model performance was compared for predicting 3-month postoperative overall morbidity. Temporal validation and additional analysis in major liver resection patients were conducted. Results: 889 patients included. Expert-informed models showed low average bias (2-5 CCI points) with high over/underprediction. ML models performed similarly: average prediction 5-10 points higher than observed CCI values with high variability (95% CI -30 to 50). No performance improvement for major Conclusion: No clinical relevance in the application of ML for predicting postoperative overall morbidity was found. Despite being a novel hype, ML has the potential for application in clinical practice. However, at this stage it does not replace established approaches of prediction modelling.
引用
收藏
页码:674 / 681
页数:8
相关论文
共 44 条
[1]   Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis [J].
Adams, Roy ;
Henry, Katharine E. ;
Sridharan, Anirudh ;
Soleimani, Hossein ;
Zhan, Andong ;
Rawat, Nishi ;
Johnson, Lauren ;
Hager, David N. ;
Cosgrove, Sara E. ;
Markowski, Andrew ;
Klein, Eili Y. ;
Chen, Edward S. ;
Saheed, Mustapha O. ;
Henley, Maureen ;
Miranda, Sheila ;
Houston, Katrina ;
Linton, Robert C. ;
Ahluwalia, Anushree R. ;
Wu, Albert W. ;
Saria, Suchi .
NATURE MEDICINE, 2022, 28 (07) :1455-+
[2]   Complication timing impacts 30-d mortality after hepatectomy [J].
Amini, Neda ;
Margonis, Georgios A. ;
Kim, Yuhree ;
Wilson, Ana ;
Gani, Faiz ;
Pawlik, Timothy M. .
JOURNAL OF SURGICAL RESEARCH, 2016, 203 (02) :495-506
[3]   Automated Gleason grading of prostate cancer tissue microarrays via deep learning [J].
Arvaniti, Eirini ;
Fricker, Kim S. ;
Moret, Michael ;
Rupp, Niels ;
Hermanns, Thomas ;
Fankhauser, Christian ;
Wey, Norbert ;
Wild, Peter J. ;
Ruschoff, Jan H. ;
Claassen, Manfred .
SCIENTIFIC REPORTS, 2018, 8
[4]   Deep Learning in Mammography Diagnostic Accuracy of a Multipurpose Image Analysis Software in the Detection of Breast Cancer [J].
Becker, Anton S. ;
Marcon, Magda ;
Ghafoor, Soleen ;
Wurnig, Moritz C. ;
Frauenfelder, Thomas ;
Boss, Andreas .
INVESTIGATIVE RADIOLOGY, 2017, 52 (07) :434-440
[5]   Development and Evaluation of the Universal ACS NSQIP Surgical Risk Calculator: A Decision Aid and Informed Consent Tool for Patients and Surgeons [J].
Bilimoria, Karl Y. ;
Liu, Yaoming ;
Paruch, Jennifer L. ;
Zhou, Lynn ;
Kmiecik, Thomas E. ;
Ko, Clifford Y. ;
Cohen, Mark E. .
JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2013, 217 (05) :833-+
[6]   STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT [J].
BLAND, JM ;
ALTMAN, DG .
LANCET, 1986, 1 (8476) :307-310
[7]   Application of machine learning to the prediction of postoperative sepsis after appendectomy [J].
Bunn, Corinne ;
Kulshrestha, Sujay ;
Boyda, Jason ;
Balasubramanian, Neelam ;
Birch, Steven ;
Karabayir, Ibrahim ;
Baker, Marshall ;
Luchette, Fred ;
Modave, Francois ;
Akbilgic, Oguz .
SURGERY, 2021, 169 (03) :671-677
[8]   A NEW METHOD OF CLASSIFYING PROGNOSTIC CO-MORBIDITY IN LONGITUDINAL-STUDIES - DEVELOPMENT AND VALIDATION [J].
CHARLSON, ME ;
POMPEI, P ;
ALES, KL ;
MACKENZIE, CR .
JOURNAL OF CHRONIC DISEASES, 1987, 40 (05) :373-383
[9]   A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models [J].
Christodoulou, Evangelia ;
Ma, Jie ;
Collins, Gary S. ;
Steyerberg, Ewout W. ;
Verbakel, Jan Y. ;
Van Calster, Ben .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 110 :12-22
[10]   The Clavien-Dindo Classification of Surgical Complications Five-Year Experience [J].
Clavien, Pierre A. ;
Barkun, Jeffrey ;
de Oliveira, Michelle L. ;
Vauthey, Jean Nicolas ;
Dindo, Daniel ;
Schulick, Richard D. ;
de Santibanes, Eduardo ;
Pekolj, Juan ;
Slankamenac, Ksenija ;
Bassi, Claudio ;
Graf, Rolf ;
Vonlanthen, Rene ;
Padbury, Robert ;
Cameron, John L. ;
Makuuchi, Masatoshi .
ANNALS OF SURGERY, 2009, 250 (02) :187-196