ChatGPT vs UpToDate: comparative study of usefulness and reliability of Chatbot in common clinical presentations of otorhinolaryngology-head and neck surgery

被引：10

作者：

Karimov, Ziya ^{[1
]}

Allahverdiyev, Irshad ^{[2
]}

Agayarov, Ozlem Yagiz ^{[3
]}

Demir, Dogukan ^{[3
]}

Almuradova, Elvina ^{[4
,5
]}

机构：

[1] Ege Univ, Med Program, Fac Med, TR-35100 Izmir, Turkiye

[2] Istanbul Univ, Istanbul Fac Med, Program Med, Istanbul, Turkiye

[3] Hlth Sci Univ, Izmir Tepecik Educ & Res Hosp, Dept Otolaryngol Head & Neck Surg, Izmir, Turkiye

[4] Ege Univ, Fac Med, Dept Med Oncol, Izmir, Turkiye

[5] Medicana Int Hosp, Dept Oncol, Izmir, Turkiye

来源：

EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY | 2024年 / 281卷 / 04期

关键词：

Artificial intelligence; Chatbot; ChatGPT; ENT; UpToDate; Otorhinolaryngology and head and neck surgery; EPIDEMIOLOGY; AGREEMENT;

D O I：

10.1007/s00405-023-08423-w

中图分类号：

R76 [耳鼻咽喉科学];

学科分类号：

100213 ;

摘要：

Purpose The usage of Chatbots as a kind of Artificial Intelligence in medicine is getting to increase in recent years. UpToDate (R) is another well-known search tool established on evidence-based knowledge and is used daily by doctors worldwide. In this study, we aimed to investigate the usefulness and reliability of ChatGPT compared to UpToDate in Otorhinolaryngology and Head and Neck Surgery (ORL-HNS).Materials and methods ChatGPT-3.5 and UpToDate were interrogated for the management of 25 common clinical case scenarios (13 males/12 females) recruited from literature considering the daily observation at the Department of Otorhinolaryngology of Ege University Faculty of Medicine. Scientific references for the management were requested for each clinical case. The accuracy of the references in the ChatGPT answers was assessed on a 0-2 scale and the usefulness of the ChatGPT and UpToDate answers was assessed with 1-3 scores by reviewers. UpToDate and ChatGPT 3.5 responses were compared.Results ChatGPT did not give references in some questions in contrast to UpToDate. Information on the ChatGPT was limited to 2021. UpToDate supported the paper with subheadings, tables, figures, and algorithms. The mean accuracy score of references in ChatGPT answers was 0.25-weak/unrelated. The median (Q1-Q3) was 1.00 (1.25-2.00) for ChatGPT and 2.63 (2.75-3.00) for UpToDate, the difference was statistically significant (p < 0.001). UpToDate was observed more useful and reliable than ChatGPT.Conclusions ChatGPT has the potential to support the physicians to find out the information but our results suggest that ChatGPT needs to be improved to increase the usefulness and reliability of medical evidence-based knowledge.

引用

页码：2145 / 2151

页数：7

共 35 条

[11] INTEGRATION AND GENERALIZATION OF KAPPAS FOR MULTIPLE RATERS [J].

CONGER, AJ .

PSYCHOLOGICAL BULLETIN, 1980, 88 (02) :322-328

[12] Predicting Postoperative Cochlear Implant Performance Using Supervised Machine Learning [J].

Crowson, Matthew G. ;

Dixon, Peter ;

Mahmood, Rafid ;

Lee, Jong Wook ;

Shipp, David ;

Le, Trung ;

Lin, Vincent ;

Chen, Joseph ;

Chan, Timothy C. Y. .

OTOLOGY & NEUROTOLOGY, 2020, 41 (08) :E1013-E1023

[13] I Asked a ChatGPT to Write an Editorial About How We Can Incorporate Chatbots Into Neurosurgical Research and Patient Care. [J].

D'Amico, Randy S. ;

White, Timothy G. ;

Shah, Harshal A. ;

Langer, David J. .

NEUROSURGERY, 2023, 92 (04) :663-664

[14] HIGH AGREEMENT BUT LOW KAPPA .1. THE PROBLEMS OF 2 PARADOXES [J].

FEINSTEIN, AR ;

CICCHETTI, DV .

JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) :543-549

[15] Obstructive sleep apnea is a common disorder in the population - a review on the epidemiology of sleep apnea [J].

Franklin, Karl A. ;

Lindberg, Eva .

JOURNAL OF THORACIC DISEASE, 2015, 7 (08) :1311-1322

[16]

Gwet K.L., 2014, Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters

[17]

Hayois L., 2023, InnovAiT, V16, P79, DOI [10.1177/17557380221140131, DOI 10.1177/17557380221140131]

[18] ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions [J].

Hoch, Cosima C. ;

Wollenberg, Barbara ;

Lueers, Jan-Christoffer ;

Knoedler, Samuel ;

Knoedler, Leonard ;

Frank, Konstantin ;

Cotofana, Sebastian ;

Alfertshofer, Michael .

EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2023, 280 (09) :4271-4278

[19] Use of UpToDate and outcomes in US hospitals [J].

Isaac, Thomas ;

Zheng, Jie ;

Jha, Ashish .

JOURNAL OF HOSPITAL MEDICINE, 2012, 7 (02) :85-90

[20]

Johnson Douglas, 2023, Res Sq, DOI 10.21203/rs.3.rs-2566942/v1

← 1 2 3 4 →