Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study

被引:21
作者
Sarbay, Ibrahim [1 ]
Berikol, Goeksu Bozdereli [2 ]
Ozturan, Ibrahim Ulas [3 ,4 ]
机构
[1] Kesan State Hosp, Dept Emergency Med, Asagi Zaferiye Mahallesi Evrese Caddesi, Edirne, Turkiye
[2] Bakirkoy Dr Sadi Konuk Training & Res Hosp, Dept Emergency Med, Istanbul, Turkiye
[3] Kocaeli Univ, Fac Med, Dept Emergency Med, Kocaeli, Turkiye
[4] Acibadem Univ, Inst Hlth Sci, Dept Med Educ, Istanbul, Turkiye
来源
TURKISH JOURNAL OF EMERGENCY MEDICINE | 2023年 / 23卷 / 03期
关键词
Chatbot; ChatGPT; emergency severity index; triage; CHIEF COMPLAINT;
D O I
10.4103/tjem.tjem_79_23
中图分类号
R4 [临床医学];
学科分类号
1002 ; 100602 ;
摘要
OBJECTIVES: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction. METHODS: This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative). RESULTS: Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen's Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34-78.2), specificity of 34.5% (95% CI: 17.9-54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8-57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9-75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8-91.8), specificity of 93.1% (95% CI: 77.2-99.2), PPV of 88.9% (95% CI: 65.3-98.6), NPV of 84.4 (95% CI: 67.2-94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724-0.969, P < 0.001) for high acuity cases. CONCLUSION: The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.
引用
收藏
页码:156 / +
页数:9
相关论文
共 30 条
[1]   Evaluation of the emergency severity index (version 3) triage algorithm in pediatric patients [J].
Baumann, MR ;
Strout, TD .
ACADEMIC EMERGENCY MEDICINE, 2005, 12 (03) :219-224
[2]   Use of emergency department chief complaint and diagnostic codes for identifying respiratory illness in a pediatric population [J].
Beitel, AJ ;
Olson, KL ;
Reis, BY ;
Mandl, KD .
PEDIATRIC EMERGENCY CARE, 2004, 20 (06) :355-360
[3]  
Benoit JRA, 2023, medRxiv, DOI [10.1101/2023.02.04.23285478, 10.1101/2023.02.04.23285478, DOI 10.1101/2023.02.04.23285478]
[4]   Revisions to the Canadian Emergency Department Triage and Acuity Scale (CTAS) Guidelines 2016 [J].
Bullard, Michael J. ;
Musgrave, Erin ;
Warren, David ;
Unger, Bernard ;
Skeldon, Thora ;
Grierson, Rob ;
van der Linde, Etienne ;
Swain, Janel .
CANADIAN JOURNAL OF EMERGENCY MEDICINE, 2017, 19 :S18-S27
[5]   A Literature Survey of Recent Advances in Chatbots [J].
Caldarini, Guendalina ;
Jaf, Sardar ;
McGarry, Kenneth .
INFORMATION, 2022, 13 (01)
[6]   Accuracy of online symptom checkers and the potential impact on service utilisation [J].
Ceney, Adam ;
Tolond, Stephanie ;
Glowinski, Andrzej ;
Marks, Ben ;
Swift, Simon ;
Palser, Tom .
PLOS ONE, 2021, 16 (07)
[7]  
ChatGPT, Optimizing Language Models for Dialogue 2022
[8]   Quro: Facilitating User Symptom Check Using a Personalised Chatbot - Oriented Dialogue System [J].
Ghosh, Shameek ;
Bhatia, Sammi ;
Bhatia, Abhi .
CONNECTING THE SYSTEM TO ENHANCE THE PRACTITIONER AND CONSUMER EXPERIENCE IN HEALTHCARE, 2018, 252 :51-56
[9]   The Emergency Severity Index Version 4: Changes to ESI level 1 and pediatric fever criteria [J].
Gilboy, N ;
Tanabe, P ;
Travers, DA .
JOURNAL OF EMERGENCY NURSING, 2005, 31 (04) :357-362
[10]   A Cognitive Behavioral Therapy Chatbot (Otis) for Health Anxiety Management: Mixed Methods Pilot Study [J].
Goonesekera, Yenushka ;
Donkin, Liesje .
JMIR FORMATIVE RESEARCH, 2022, 6 (10)