Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study

被引:1
作者
Wang, Chenxu [1 ,2 ]
Wang, Fei [3 ]
Li, Shuhan [2 ]
Ren, Qing-wen [4 ]
Tan, Xiaomei [2 ]
Fu, Yaoyu [1 ]
Liu, Di [1 ,2 ,5 ]
Qian, Guangwu [6 ]
Cao, Yu [1 ,7 ]
Yin, Rong [1 ,2 ,5 ]
Li, Kang [1 ,5 ]
机构
[1] Sichuan Univ, West China Hosp, West China Biomed Big Data Ctr, 37 Guoxue Lane, Chengdu 610041, Peoples R China
[2] Sichuan Univ, Dept Ind Engn, Chengdu, Peoples R China
[3] Sichuan Univ, West China Sch Med, Dept Nursing, Chengdu, Peoples R China
[4] Univ Hong Kong, Queen Mary Hosp, Dept Med, Hong Kong, Peoples R China
[5] Sichuan Univ, Medx Ctr Informat, Chengdu, Peoples R China
[6] Sichuan Univ, Dept Comp Sci, Chengdu, Peoples R China
[7] Sichuan Univ, West China Hosp, Dept Emergency Med, Chengdu, Peoples R China
关键词
ChatGPT; artificial intelligence; patient triage; health care; prompt engineering; large language models; Modified Early Warning Score; EARLY WARNING SCORE; MEWS;
D O I
10.2196/71613
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Emergency departments (EDs) face significant challenges due to overcrowding, prolonged waiting times, and staff shortages, leading to increased strain on health care systems. Efficient triage systems and accurate departmental guidance are critical for alleviating these pressures. Recent advancements in large language models (LLMs), such as ChatGPT, offer potential solutions for improving patient triage and outpatient department selection in emergency settings. Objective: The study aimed to assess the accuracy, consistency, and feasibility of GPT-4-based ChatGPT models (GPT-4o and GPT-4-Turbo) for patient triage using the Modified Early Warning Score (MEWS) and evaluate GPT-4o's ability to provide accurate outpatient department guidance based on simulated patient scenarios. Methods: A 2-phase experimental study was conducted. In the first phase, 2 ChatGPT models (GPT-4o and GPT-4-Turbo) were evaluated for MEWS-based patient triage accuracy using 1854 simulated patient scenarios. Accuracy and consistency were assessed before and after prompt engineering. In the second phase, GPT-4o was tested for outpatient department selection accuracy using 264 scenarios sourced from the Chinese Medical Case Repository. Each scenario was independently evaluated by GPT-4o thrice. Data analyses included Wilcoxon tests, Kendall correlation coefficients, and logistic regression analyses. Results: In the first phase, ChatGPT's triage accuracy, based on MEWS, improved following prompt engineering. Interestingly, GPT-4-Turbo outperformed GPT-4o. GPT-4-Turbo achieved an accuracy of 100% compared to GPT-4o's accuracy of 96.2%, despite GPT-4o initially showing better performance prior to prompt engineering. This finding suggests that GPT-4-Turbo may be more adaptable to prompt optimization. In the second phase, GPT-4o, with superior performance on emotional responsiveness compared to GPT-4-Turbo, demonstrated an overall guidance accuracy of 92.63% (95% CI 90.34%-94.93%), with the highest accuracy in internal medicine (93.51%, 95% CI 90.85%-96.17%) and the lowest in general surgery (91.46%, 95% CI 86.50%-96.43%). Conclusions:ChatGPT demonstrated promising capability for supporting patient triage and outpatient guidance in EDs. GPT-4-Turbo showed greater adaptability to prompt engineering, whereas GPT-4o exhibited superior responsiveness and emotional interaction, which are essential for patient-facing tasks. Future studies should explore real-world implementation and address the identified limitations to enhance ChatGPT's clinical integration.
引用
收藏
页数:16
相关论文
共 35 条
[21]   Comparison of the National Early Warning Score (NEWS) and the Modified Early Warning Score (MEWS) for predicting admission and in-hospital mortality in elderly patients in the pre-hospital setting and in the emergency department [J].
Mitsunaga, Toshiya ;
Hasegawa, Izumu ;
Uzura, Masahiko ;
Okuno, Kenji ;
Otani, Kei ;
Ohtaki, Yuhei ;
Sekine, Akihiro ;
Takeda, Satoshi .
PEERJ, 2019, 7
[22]   An integrative review of promoting trust in the patient-primary care provider relationship [J].
Murray, Billie ;
McCrone, Susan .
JOURNAL OF ADVANCED NURSING, 2015, 71 (01) :3-23
[23]   Evaluation of the impact of artificial intelligence-assisted image interpretation on the diagnostic performance of clinicians in identifying pneumothoraces on plain chest X-ray: a multi-case multi-reader study [J].
Novak, Alex ;
Ather, Sarim ;
Gill, Avneet ;
Aylward, Peter ;
Maskell, Giles ;
Cowell, Gordon W. ;
Morgado, Abdala Trinidad Espinosa ;
Duggan, Tom ;
Keevill, Melissa ;
Gamble, Olivia ;
Akrama, Osama ;
Belcher, Elizabeth ;
Taberham, Rhona ;
Hallifax, Rob ;
Bahra, Jasdeep ;
Banerji, Abhishek ;
Bailey, Jon ;
James, Antonia ;
Ansaripour, Ali ;
Spence, Nathan ;
Wrightson, John ;
Jarral, Waqas ;
Barry, Steven ;
Bhatti, Saher ;
Astley, Kerry ;
Shadmaan, Amied ;
Ghelman, Sharon ;
Baenen, Alec ;
Oke, Jason ;
Bloomfield, Claire ;
Johnson, Hilal ;
Beggs, Mark ;
Gleeson, Fergus .
EMERGENCY MEDICINE JOURNAL, 2024, 41 (10) :602-609
[24]   Current status of emergency department triage in mainland China: A narrative review of the literature [J].
Peng, Lingli ;
Hammad, Karen .
NURSING & HEALTH SCIENCES, 2015, 17 (02) :148-158
[25]   Comparing complaint-based triage scales and early warning scores for emergency department triage [J].
Schinkel, Michiel ;
Bergsma, Lyfke ;
Veldhuis, Lars Ingmar ;
Ridderikhof, Milan L. ;
Holleman, Frits .
EMERGENCY MEDICINE JOURNAL, 2022, 39 (09) :691-+
[26]   Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports [J].
Shieh, Allen ;
Tran, Brandon ;
He, Gene ;
Kumar, Mudit ;
Freed, Jason A. ;
Majety, Priyanka .
SCIENTIFIC REPORTS, 2024, 14 (01)
[27]   Reducing waiting time and raising outpatient satisfaction in a Chinese public tertiary general hospital-an interrupted time series study [J].
Sun, Jing ;
Lin, Qian ;
Zhao, Pengyu ;
Zhang, Qiongyao ;
Xu, Kai ;
Chen, Huiying ;
Hu, Cecile Jia ;
Stuntz, Mark ;
Li, Hong ;
Liu, Yuanli .
BMC PUBLIC HEALTH, 2017, 17
[28]   ARTIFICIAL-INTELLIGENCE IN MEDICAL DIAGNOSIS [J].
SZOLOVITS, P ;
PATIL, RS ;
SCHWARTZ, WB .
ANNALS OF INTERNAL MEDICINE, 1988, 108 (01) :80-87
[29]   Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study [J].
Takagi, Soshi ;
Watari, Takashi ;
Erabi, Ayano ;
Sakaguchi, Kota .
JMIR MEDICAL EDUCATION, 2023, 9
[30]   Large language models in medicine [J].
Thirunavukarasu, Arun James ;
Ting, Darren Shu Jeng ;
Elangovan, Kabilan ;
Gutierrez, Laura ;
Tan, Ting Fang ;
Ting, Daniel Shu Wei .
NATURE MEDICINE, 2023, 29 (08) :1930-1940