GenAI exceeds clinical experts in predicting acute kidney injury following paediatric cardiopulmonary bypass

被引:0
作者
Sharabiani, Mansour [1 ]
Mahani, Alireza [2 ]
Bottle, Alex [1 ]
Srinivasan, Yadav [3 ]
Issitt, Richard [3 ]
Stoica, Serban [4 ]
机构
[1] Imperial Coll London, Sch Publ Hlth, London, England
[2] New York Stock Exchange, New York 10005, NY USA
[3] Great Ormond St Hosp Sick Children, London, England
[4] Bristol Royal Hosp Children, Bristol, England
关键词
Generative artificial intelligence; Text embedding; Electronic health records; Cardiopulmonary bypass; Acute kidney injury; Spherical k-means; IDENTIFICATION; MODEL; TEXT;
D O I
10.1038/s41598-025-04651-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The emergence of large language models (LLMs) opens new horizons to leverage, often unused, information in clinical text. Our study aims to capitalise on this new potential. Specifically, we examine the utility of text embeddings generated by LLMs in predicting postoperative acute kidney injury (AKI) in paediatric cardiopulmonary bypass (CPB) patients using electronic health record (EHR) text, and propose methods for explaining their output. AKI could be a serious complication in paediatric CPB and its accurate prediction can significantly improve patient outcomes by enabling timely interventions. We evaluate various text embedding algorithms such as Doc2Vec, top-performing sentence transformers on Hugging Face, and commercial LLMs from Google and OpenAI. We benchmark the cross-validated performance of these 'AI models' against a 'baseline model' as well as an established clinically-defined 'expert model'. The baseline model includes structured features, i.e., patient gender, age, height, body mass index and length of operation. The majority of AI models surpass, not only the baseline model, but also the expert model. An ensemble of AI and clinical-expert models improves discriminative performance by 23% compared to the baseline model. Consistency of patient clusters formed from AI-generated embeddings with clinical-expert clusters-measured via the adjusted rand index and adjusted mutual information metrics-illustrates the medical validity of LLM embeddings. We create a reverse mapping from the numeric embedding space to the natural-language domain via the embedding-based clusters, generating medical labels for the clusters in the process. We also use text-generating LLMs to summarise the differences between AI and expert clusters. Such 'explainability' outputs can increase medical practitioners' trust in the AI applications, and help generate new hypotheses, e.g., by studying the association of cluster memberships and outcomes of interest.
引用
收藏
页数:16
相关论文
共 37 条
[1]  
Abdi Herve., 2007, ENCY MEASUREMENT STA, P1
[2]  
Alba C, 2024, Arxiv, DOI arXiv:2402.17493
[3]  
Alsentzer E, 2019, Arxiv, DOI [arXiv:1904.03323, DOI 10.48550/ARXIV.1904.03323]
[4]   LOCALLY WEIGHTED REGRESSION - AN APPROACH TO REGRESSION-ANALYSIS BY LOCAL FITTING [J].
CLEVELAND, WS ;
DEVLIN, SJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1988, 83 (403) :596-610
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175
[7]  
Doshi-Velez F, 2017, Arxiv, DOI arXiv:1702.08608
[8]  
Ghali MK, 2024, Arxiv, DOI [arXiv:2405.20585, DOI 10.48550/ARXIV.240520585]
[9]  
Greco M., 2024, Nutrition, metabolism and kidney support: A critical care approach, P341
[10]   Causability and explainability of artificial intelligence in medicine [J].
Holzinger, Andreas ;
Langs, Georg ;
Denk, Helmut ;
Zatloukal, Kurt ;
Mueller, Heimo .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 9 (04)