Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework

被引:54
作者
Kresevic, Simone [1 ,2 ]
Giuffre, Mauro [2 ]
Ajcevic, Milos [1 ]
Accardo, Agostino [1 ]
Croce, Lory S. [3 ]
Shung, Dennis L. [2 ]
机构
[1] Univ Trieste, Dept Engn & Architecture, Trieste, Italy
[2] Yale Univ, Dept Med Digest Dis, Yale Sch Med, New Haven, CT 06520 USA
[3] Univ Trieste, Dept Med Surg & Hlth Sci, Trieste, Italy
关键词
CHATGPT;
D O I
10.1038/s41746-024-01091-y
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Large language models (LLMs) can potentially transform healthcare, particularly in providing the right information to the right provider at the right time in the hospital workflow. This study investigates the integration of LLMs into healthcare, specifically focusing on improving clinical decision support systems (CDSSs) through accurate interpretation of medical guidelines for chronic Hepatitis C Virus infection management. Utilizing OpenAI's GPT-4 Turbo model, we developed a customized LLM framework that incorporates retrieval augmented generation (RAG) and prompt engineering. Our framework involved guideline conversion into the best-structured format that can be efficiently processed by LLMs to provide the most accurate output. An ablation study was conducted to evaluate the impact of different formatting and learning strategies on the LLM's answer generation accuracy. The baseline GPT-4 Turbo model's performance was compared against five experimental setups with increasing levels of complexity: inclusion of in-context guidelines, guideline reformatting, and implementation of few-shot learning. Our primary outcome was the qualitative assessment of accuracy based on expert review, while secondary outcomes included the quantitative measurement of similarity of LLM-generated responses to expert-provided answers using text-similarity scores. The results showed a significant improvement in accuracy from 43 to 99% (p < 0.001), when guidelines were provided as context in a coherent corpus of text and non-text sources were converted into text. In addition, few-shot learning did not seem to improve overall accuracy. The study highlights that structured guideline reformatting and advanced prompt engineering (data quality vs. data quantity) can enhance the efficacy of LLM integrations to CDSSs for guideline delivery.
引用
收藏
页数:9
相关论文
共 59 条
[11]   How appropriate are answers of online chat-based artificial intelligence (ChatGPT) to common questions on colon cancer? [J].
Emile, Sameh Hany ;
Horesh, Nir ;
Freund, Michael ;
Pellino, Gianluca ;
Oliveira, Lucia ;
Wignakumar, Anjelli ;
Wexner, Steven D. .
SURGERY, 2023, 174 (05) :1273-1275
[12]   Quality of ChatGPT Responses to Questions Related To Liver Transplantation [J].
Endo, Yutaka ;
Sasaki, Kazunari ;
Moazzam, Zorays ;
Lima, Henrique A. ;
Schenk, Austin ;
Limkemann, Ashley ;
Washburn, Kenneth ;
Pawlik, Timothy M. .
JOURNAL OF GASTROINTESTINAL SURGERY, 2023, 27 (08) :1716-1719
[13]   SummEval: Re-evaluating Summarization Evaluation [J].
Fabbri, Alexander R. ;
Kryscinski, Wojciech ;
McCann, Bryan ;
Xiong, Caiming ;
Socher, Richard ;
Radev, Dragomir .
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 :391-409
[14]  
Falade-Nwulia O, 2017, ANN INTERN MED, V166, P637, DOI [10.7326/M16-2575, 10.7326/m16-2575]
[15]  
Ge J., 2023, Development of a liver disease-specific large language model chat interface using retrieval augmented generation, DOI [10.1101/2023.11.10.23298364, DOI 10.1101/2023.11.10.23298364]
[16]   Scrutinizing ChatGPT Applications in Gastroenterology: A Call for Methodological Rigor to Define Accuracy and Preserve Privacy [J].
Giuffre, Mauro ;
Shung, Dennis l. .
CLINICAL GASTROENTEROLOGY AND HEPATOLOGY, 2024, 22 (10) :2156-2157
[17]   Evaluating ChatGPT in Medical Contexts: The Imperative to Guard Against Hallucinations and Partial Accuracies [J].
Giuffre, Mauro ;
You, Kisung ;
Shung, Dennis l. .
CLINICAL GASTROENTEROLOGY AND HEPATOLOGY, 2024, 22 (05) :1145-1146
[18]   Applying artificial intelligence to clinical decision support in mental health: What have we learned? [J].
Golden, Grace ;
Popescu, Christina ;
Israel, Sonia ;
Perlman, Kelly ;
Armstrong, Caitrin ;
Fratila, Robert ;
Tanguay-Sela, Myriam ;
Benrimoh, David .
HEALTH POLICY AND TECHNOLOGY, 2024, 13 (02)
[19]   Harnessing language models for streamlined postcolonoscopy patient management: a novel approach [J].
Gorelik, Yuri ;
Ghersin, Itai ;
Maza, Itay ;
Klein, Amir .
GASTROINTESTINAL ENDOSCOPY, 2023, 98 (04) :639-641.e4
[20]   Evaluation of the Potential Utility of an Artificial Intelligence Chatbot in Gastroesophageal Reflux Disease Management [J].
Henson, Jacqueline B. ;
Brown, Jeremy R. Glissen ;
Lee, Joshua P. ;
Patel, Amit ;
Leiman, David A. .
AMERICAN JOURNAL OF GASTROENTEROLOGY, 2023, 118 (12) :2276-2279