The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study

被引:2
|
作者
Menezes, Maria Clara Saad [1 ,2 ]
Hoffmann, Alexander F. [1 ]
Tan, Amelia L. M. [1 ]
Nalbandyan, Marine [3 ]
Omenn, Gilbert S. [4 ]
Mazzotti, Diego R. [8 ,9 ]
Hernandez-Arango, Alejandro [13 ]
Visweswaran, Shyam [14 ,15 ]
Mandl, Kenneth D. [16 ]
Bourgeois, Florence [17 ]
Lee, James W. K. [18 ]
Makmur, Andrew [19 ]
Hanauer, David A. [5 ]
Semanik, Michael G.
Kerivan, Lauren T. [10 ]
Hill, Terra [10 ]
Forero, Julian [13 ]
Restrepo, Carlos [13 ]
Vigna, Matteo [20 ]
Ceriana, Piero [20 ]
Abu-el-rub, Noor [11 ]
Avillach, Paul [1 ]
Bellazzi, Riccardo [22 ]
Callaci, Thomas [3 ]
Gutierrez-Sacristan, Alba [1 ]
Malovini, Alberto [21 ]
Mathew, Jomol P. [3 ]
Morris, Michele [14 ]
Murthy, Venkatesh L. [6 ,7 ]
Buonocore, Tommaso M. [22 ]
Parimbelli, Enea [22 ]
Patel, Lav P. [11 ]
Saez, Carlos [23 ]
Samayamuthu, Malarkodi Jebathilagam [14 ]
Thompson, Jeffrey A. [12 ]
Tibollo, Valentina [21 ]
Xia, Zongqi [15 ]
Kohane, Isaac S. [1 ]
机构
[1] Harvard Univ, Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[2] Univ Texas Southwestern, Dept Internal Med, Dallas, TX USA
[3] Univ Wisconsin, Dept Biostat & Med Informat, Sch Med & Publ Hlth, Madison, WI USA
[4] Univ Michigan, Computat Med & Bioinformat, Internal Med Human Genet Environm Hlth, Ann Arbor, MI 48109 USA
[5] Univ Michigan, Dept Learning Hlth Sci, Ann Arbor, MI USA
[6] Univ Michigan, Dept Internal Med, Ann Arbor, MI USA
[7] Univ Michigan, Frankel Cardiovasc Ctr, Ann Arbor, MI USA
[8] Yale Sch Med, Dept Internal Med, Div Pulm Crit Care & Sleep Med, Dept Internal Med, New Haven, CT USA
[9] Univ Kansas, Med Ctr, Dept Internal Med, Div Pulm Crit Care & Sleep Med, Kansas City, KS USA
[10] Univ Kansas, Med Ctr, Dept Surg, Kansas City, KS USA
[11] Univ Kansas, Med Ctr, Res Informat, Kansas City, KS USA
[12] Univ Kansas, Med Ctr, Dept Biostat & Data Sci, Kansas City, KS USA
[13] Univ Antioquia, Hosp Alma Mater Antioquia, Medellin, Colombia
[14] Univ Pittsburgh, Dept Biomed Informat, Pittsburgh, PA USA
[15] Univ Pittsburgh, Dept Neurol, Pittsburgh, PA USA
[16] Boston Childrens Hosp, Computat Hlth Informat Program, Boston, MA USA
[17] Boston Childrens Hosp, Dept Pathol, Boston, MA USA
[18] DEPT SURG, Singapore, Norway
[19] Natl Univ Hlth Syst, Dept Diagnost Imaging, Singapore, Singapore
[20] Univ Pavia, Ist Clin Sci Maugeri, Resp Rehabil Unit, Pavia, Italy
[21] Istituti Ricovero & Cura Carattere Sci IRCCS, Ist Clin Sci Maugeri, Lab Med Informat & Artificial Intelligenc, Pavia, Italy
[22] Univ Pavia, Dept Elect Comp & Biomed Engn, Pavia, Italy
[23] Univ Politecn Valencia, Inst Univ Tecnol Informac & Comunicac, Biomed Data Sci Lab, Valencia, Spain
来源
LANCET DIGITAL HEALTH | 2025年 / 7卷 / 01期
关键词
D O I
10.1016/S2589-7500(24)00246-2
中图分类号
R-058 [];
学科分类号
摘要
Background Patient notes contain substantial information but are difficult for computers to analyse due to unstructured format. Large-language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4), changed our ability to process text, but we do not know how effectively they handle medical notes. We aimed to the ability of GPT-4 to answer predefined questions after reading medical notes in three different languages. Methods For this retrospective model-evaluation study, we included eight university hospitals from four countries (ie, the USA, Colombia, Singapore, and Italy). Each site submitted seven de-identified medical notes related seven separate patients to the coordinating centre between June 1, 2023, and Feb 28, 2024. Medical notes were between Feb 1, 2020, and June 1, 2023. One site provided medical notes in Spanish, one site provided notes in and the remaining six sites provided notes in English. We included admission notes, progress notes, and consultation notes. No discharge summaries were included in this study. We advised participating sites to choose medical that, at time of hospital admission, were for patients who were male or female, aged 18-65 years, had a diagnosis obesity, had a diagnosis of COVID-19, and had submitted an admission note. Adherence to these criteria was optional and participating sites randomly chose which medical notes to submit. When entering information into GPT-4, prepended each medical note with an instruction prompt and a list of 14 questions that had been chosen a Each medical note was individually given to GPT-4 in its original language and in separate sessions; the questions were always given in English. At each site, two physicians independently validated responses by GPT-4 and responded to all 14 questions. Each pair of physicians evaluated responses from GPT-4 to the seven medical notes from own site only. Physicians were not masked to responses from GPT-4 before providing their own answers, but masked to responses from the other physician. Findings We collected 56 medical notes, of which 42 (75%) were in English, seven (13%) were in Italian, seven (13%) were in Spanish. For each medical note, GPT-4 responded to 14 questions, resulting in 784 responses. 622 (79%, 95% CI 76-82) of 784 responses, both physicians agreed with GPT-4. In 82 (11%, 8-13) responses, one physician agreed with GPT-4. In the remaining 80 (10%, 8-13) responses, neither physician agreed with Both physicians agreed with GPT-4 more often for medical notes written in Spanish (86 [88%, 95% CI 79-93] 98 responses) and Italian (82 [84%, 75-90] of 98 responses) than in English (454 [77%, 74-80] of 588 responses). Interpretation The results of our model-evaluation study suggest that GPT-4 is accurate when analysing medical in three different languages. In the future, research should explore how LLMs can be integrated into clinical workflows to maximise their use in health care.
引用
收藏
页码:e35 / e43
页数:9
相关论文
共 18 条
  • [1] Generative Pre-trained Transformer 4 (GPT-4) in clinical settings
    Bellini, Valentina
    Bignami, Elena Giovanna
    LANCET DIGITAL HEALTH, 2025, 7 (01): : e6 - e7
  • [2] Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination
    Wang, Ting
    Mainous III, Arch G.
    Stelter, Keith
    O'Neill, Thomas R.
    Newton, Warren P.
    JOURNAL OF THE AMERICAN BOARD OF FAMILY MEDICINE, 2024, 37 (04) : 528 - 582
  • [3] Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports
    Hasani, Amir M.
    Singh, Shiva
    Zahergivar, Aryan
    Ryan, Beth
    Nethala, Daniel
    Bravomontenegro, Gabriela
    Mendhiratta, Neil
    Ball, Mark
    Farhadi, Faraz
    Malayeri, Ashkan
    EUROPEAN RADIOLOGY, 2024, 34 (06) : 3566 - 3574
  • [5] Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4)
    Truhn, Daniel
    Loeffler, Chiara M. L.
    Mueller-Franzes, Gustav
    Nebelung, Sven
    Hewitt, Katherine J.
    Brandner, Sebastian
    Bressem, Keno K.
    Foersch, Sebastian
    Kather, Jakob Nikolas
    JOURNAL OF PATHOLOGY, 2024, 262 (03): : 310 - 319
  • [6] Enhancing emergency department charting: Using Generative Pre-trained Transformer-4 (GPT-4) to identify laceration repairs
    Bains, Jaskaran
    Williams, Christopher Y. K.
    Johnson, Drake
    Schwartz, Hope
    Sabbineni, Naina
    Butte, Atul J.
    Kornblith, Aaron E.
    ACADEMIC EMERGENCY MEDICINE, 2025, 32 (01) : 94 - 97
  • [7] Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology
    Sorin, Vera
    Klang, Eyal
    Sobeh, Tamer
    Konen, Eli
    Shrot, Shai
    Livne, Adva
    Weissbuch, Yulian
    Hoffmann, Chen
    Barash, Yiftach
    QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (10)
  • [8] Performance of Generative Pre-trained Transformer-4 (GPT-4) in Membership of the Royal College of General Practitioners (MRCGP)-style examination questions
    Armitage, Richard C.
    POSTGRADUATE MEDICAL JOURNAL, 2024, 100 (1182) : 274 - 275
  • [9] GPT4MIA: Utilizing Generative Pre-trained Transformer (GPT-3) as a Plug-and-Play Transductive Model for Medical Image Analysis
    Zhang, Yizhe
    Chen, Danny Z.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 151 - 160
  • [10] Performance of the pre-trained large language model GPT-4 on automated short answer grading
    Kortemeyer G.
    Discover Artificial Intelligence, 2024, 4 (01):