Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies

被引:0
作者
Burgisser, Nils [1 ,2 ,3 ]
Chalot, Etienne [4 ]
Mehouachi, Samia [1 ]
Buclin, Clement P. [2 ,3 ]
Lauper, Kim [1 ,3 ]
Courvoisier, Delphine S. [1 ,5 ]
Mongin, Denis [1 ,3 ]
机构
[1] Geneva Univ Hosp, Div Rheumatol, Geneva, Switzerland
[2] Geneva Univ Hosp, Div Internal Med, Geneva, Switzerland
[3] Univ Geneva, Fac Med, Geneva, Switzerland
[4] Geneva Univ Hosp, Informat Syst Directorate, Geneva, Switzerland
[5] Geneva Univ Hosp, Qual Care Div, Geneva, Switzerland
来源
RMD OPEN | 2024年 / 10卷 / 04期
关键词
Gout; Crystal arthropathies; Machine Learning; Chondrocalcinosis;
D O I
10.1136/rmdopen-2024-005003
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.Methods The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.Results The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.Conclusion LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.
引用
收藏
页数:7
相关论文
共 22 条
  • [1] The 2023 ACR/EULAR Classification Criteria for Calcium Pyrophosphate Deposition Disease
    Abhishek, Abhishek
    Tedeschi, Sara K. K.
    Pascart, Tristan
    Latourte, Augustin
    Dalbeth, Nicola
    Neogi, Tuhina
    Fuller, Amy
    Rosenthal, Ann
    Becce, Fabio
    Bardin, Thomas
    Ea, Hang Korng
    Filippou, Georgios
    FitzGerald, John
    Iagnocco, AnnaMaria
    Liote, Frederic
    McCarthy, Geraldine M. M.
    Ramonda, Roberta
    Richette, Pascal
    Sivera, Francisca
    Andres, Mariano
    Cipolletta, Edoardo
    Doherty, Michael
    Pascual, Eliseo
    Perez-Ruiz, Fernando
    So, Alexander
    Jansen, Tim L. L.
    Kohler, Minna J. J.
    Stamp, Lisa K. K.
    Yinh, Janeth
    Adinolfi, Antonella
    Arad, Uri
    Aung, Thanda
    Benillouche, Eva
    Bortoluzzi, Alessandra
    Dau, Jonathan
    Maningding, Ernest
    Fang, Meika A. A.
    Figus, Fabiana A. A.
    Filippucci, Emilio
    Haslett, Janine
    Janssen, Matthijs
    Kaldas, Marian
    Kimoto, Maryann
    Leamy, Kelly
    Navarro, Geraldine M.
    Sarzi-Puttini, Piercarlo
    Scire, Carlo
    Silvagni, Ettore
    Sirotti, Silvia
    Stack, John R. R.
    [J]. ARTHRITIS & RHEUMATOLOGY, 2023, 75 (10) : 1703 - 1713
  • [2] Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions
    Bernstein, Isaac A.
    Zhang, Youchen
    Govil, Devendra
    Majid, Iyad
    Chang, Robert T.
    Sun, Yang
    Shue, Ann
    Chou, Jonathan C.
    Schehlein, Emily
    Christopher, Karen L.
    Groth, Sylvia L.
    Ludwig, Cassie
    Wang, Sophia Y.
    [J]. JAMA NETWORK OPEN, 2023, 6 (08)
  • [3] Brown TB, 2020, ADV NEUR IN, V33
  • [4] Development and validation of a self-updating gout register from electronic health records data
    Buergisser, Nils
    Mongin, Denis
    Mehouachi, Samia
    Buclin, Clement P.
    Guemara, Romain
    Farhoumand, Pauline Darbellay
    Braillard, Olivia
    Lauper, Kim
    Courvoisier, Delphine S.
    [J]. RMD OPEN, 2024, 10 (02):
  • [5] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [6] Fan LZ, 2023, Arxiv, DOI [arXiv:2304.02020, DOI 10.1145/3664930, 10.1145/3664930]
  • [7] A Comparison of a Large Language Model vs Manual Chart Review for the Extraction of Data Elements From the Electronic Health Record
    Ge, Jin
    Li, Michael
    Delk, Molly B.
    Lai, Jennifer C.
    [J]. GASTROENTEROLOGY, 2024, 166 (04)
  • [8] Li ZH, 2024, Arxiv, DOI arXiv:2404.11553
  • [9] Use of a Large Language Model to Identify and Classify Injuries With Free-Text Emergency Department Data
    Lorenzoni, Giulia
    Gregori, Dario
    Bressan, Silvia
    Ocagli, Honoria
    Azzolina, Danila
    Da Dalt, Liviana
    Berchialla, Paola
    [J]. JAMA NETWORK OPEN, 2024, 7 (05)
  • [10] Naveed H, 2024, Arxiv, DOI [arXiv:2307.06435, DOI 10.48550/ARXIV.2307.06435]