Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4

被引:20
作者
Lahat, Adi [1 ,2 ]
Sharif, Kassem [1 ,3 ]
Zoabi, Narmin [1 ]
Patt, Yonatan Shneor [3 ]
Sharif, Yousra [4 ]
Fisher, Lior [3 ]
Shani, Uria [3 ]
Arow, Mohamad [3 ]
Levin, Roni [3 ]
Klang, Eyal [5 ]
机构
[1] Tel Aviv Univ, Chaim Sheba Med Ctr, Dept Gastroenterol, IL- 5262100 Ramat Gan, Israel
[2] Ben Gurion Univ Negev, Samson Assuta Ashdod Med Ctr, Dept Gastroenterol, Beer Sheva, Israel
[3] Sheba Med Ctr, Internal Med B, Tel Aviv, Israel
[4] Hadassah Med Ctr, Dept Internal Med C, Jerusalem, Israel
[5] Icahn Sch Med Mt Sinai, Div Data Driven & Digital Med D3M, New York, NY USA
关键词
ChatGPT; chat-GPT; chatbot; chatbots; chat-bot; chat-bots; natural language processing; NLP; artificial intelligence; AI; machine learning; ML; algorithm; algorithms; predictive model; predictive models; predictive analytics; predictive system; practical model; practical models; internal medicine; ethics; ethical; ethical dilemma; ethical dilemmas; bioethics; emergency medicine; EM medicine; ED physician; emergency physician; emergency doctor;
D O I
10.2196/54571
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision -making and patient engagement. Objective: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision -making while comparing seniors' and residents' ratings, and specific question types. Methods: A total of 4 specialized physicians formulated 176 real -world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. Results: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P <.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P <.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P <.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions. Conclusions: ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.
引用
收藏
页数:15
相关论文
共 25 条
[1]  
Borowka S., 2024, arXiv, DOI [10.31222/osf.io/x6aut, DOI 10.1111/CGF.13815]
[2]   Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support [J].
Chiesa-Estomba, Carlos M. ;
Lechien, Jerome R. ;
Vaira, Luigi A. ;
Brunet, Aina ;
Cammaroto, Giovanni ;
Mayo-Yanez, Miguel ;
Sanchez-Barrueco, Alvaro ;
Saga-Gutierrez, Carlos .
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2023, 281 (4) :2081-2086
[3]   Developing an AI-Assisted Educational Chatbot for Radiotherapy Using the IBM Watson Assistant Platform [J].
Chow, James C. L. ;
Wong, Valerie ;
Sanders, Leslie ;
Li, Kay .
HEALTHCARE, 2023, 11 (17)
[4]   Impact of ChatGPT on medical chatbots as a disruptive technology [J].
Chow, James C. L. ;
Sanders, Leslie ;
Li, Kay .
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
[5]   The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers [J].
Eysenbach, Gunther .
JMIR MEDICAL EDUCATION, 2023, 9
[6]   Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning [J].
Ge, Jin ;
Lai, Jennifer C. .
HEPATOLOGY COMMUNICATIONS, 2023, 7 (04)
[7]  
Gwet K.L, 2019, Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters
[8]   Chat GPT-4 significantly surpasses GPT-3.5 in drug information queries [J].
He, Na ;
Yan, Yingying ;
Wu, Ziyang ;
Cheng, Yinchu ;
Liu, Fang ;
Li, Xiaotong ;
Zhai, Suodi .
JOURNAL OF TELEMEDICINE AND TELECARE, 2025, 31 (02) :306-308
[9]  
Hirosawa Takanobu, 2023, Int J Environ Res Public Health, V20, DOI 10.3390/ijerph20043378
[10]   Artificial intelligence in healthcare: past, present and future [J].
Jiang, Fei ;
Jiang, Yong ;
Zhi, Hui ;
Dong, Yi ;
Li, Hao ;
Ma, Sufeng ;
Wang, Yilong ;
Dong, Qiang ;
Shen, Haipeng ;
Wang, Yongjun .
STROKE AND VASCULAR NEUROLOGY, 2017, 2 (04) :230-243