ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model

被引：16

作者：

Ebrahimian, Manoochehr ^{[1
]}

Behnam, Behdad ^{[2
]}

Ghayebi, Negin ^{[3
]}

Sobhrakhshankhah, Elham ^{[2
]}

机构：

[1] Shahid Beheshti Univ Med Sci, Res Inst Childrens Hlth, Pediat Surg Res Ctr, Tehran, Iran

[2] Iran Univ Med Sci, Gastrointestinal & Liver Dis Res Ctr, Tehran, Iran

[3] Shahid Beheshti Univ Med Sci, Sch Med, Tehran, Iran

来源：

BMJ HEALTH & CARE INFORMATICS | 2023年 / 30卷 / 01期

关键词：

Artificial intelligence; Decision Making; Computer-Assisted; Neural Networks; Computer; RISKS;

D O I：

10.1136/bmjhci-2023-100815

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

IntroductionLarge language models such as ChatGPT have gained popularity for their ability to generate comprehensive responses to human queries. In the field of medicine, ChatGPT has shown promise in applications ranging from diagnostics to decision-making. However, its performance in medical examinations and its comparison to random guessing have not been extensively studied.MethodsThis study aimed to evaluate the performance of ChatGPT in the preinternship examination, a comprehensive medical assessment for students in Iran. The examination consisted of 200 multiple-choice questions categorised into basic science evaluation, diagnosis and decision-making. GPT-4 was used, and the questions were translated to English. A statistical analysis was conducted to assess the performance of ChatGPT and also compare it with a random test group.ResultsThe results showed that ChatGPT performed exceptionally well, with 68.5% of the questions answered correctly, significantly surpassing the pass mark of 45%. It exhibited superior performance in decision-making and successfully passed all specialties. Comparing ChatGPT to the random test group, ChatGPT's performance was significantly higher, demonstrating its ability to provide more accurate responses and reasoning.ConclusionThis study highlights the potential of ChatGPT in medical licensing examinations and its advantage over random guessing. However, it is important to note that ChatGPT still falls short of human physicians in terms of diagnostic accuracy and decision-making capabilities. Caution should be exercised when using ChatGPT, and its results should be verified by human experts to ensure patient safety and avoid potential errors in the medical field.

引用

页数：6

共 24 条

[21] Managerial overreliance on AI-augmented decision-making processes: How the use of AI-based advisory systems shapes choice behavior in R&D investment decisions
Keding, Christoph
Meissner, Philip
[J]. TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2021, 171
[22] A computer-human interaction model to improve the diagnostic accuracy and clinical decision-making during 12-lead electrocardiogram interpretation
Cairns, Andrew W.
Bond, Raymond R.
Finlay, Dewar D.
Breen, Cathal
Guldenring, Daniel
Gaffney, Robert
Gallagher, Anthony G.
Peace, Aaron J.
Henn, Pat
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 64 : 93 - 107
[23] Selection of AI model for predicting disability diseases through bipolar complex fuzzy linguistic multi-attribute decision-making technique based on operators
Ubaid ur Rehman
Meraj Ali Khan
Ibrahim Al-Dayel
Tahir Mahmood
[J]. Scientific Reports, 15 (1)
[24] Evaluating environmental change and behavioral decision-making for sustainability policy using an agent-based model: A case study for the Smoky Hill River Watershed, Kansas
Granco, Gabriel
Stamm, Jessica L. Heier
Bergtold, Jason S.
Daniels, Melinda D.
Sanderson, Matthew R.
Sheshukov, Aleksey Y.
Mather, Martha E.
Caldas, Marcellus M.
Ramsey, Steven M.
Lehrter, Richard J., II
Haukos, David A.
Gao, Jungang
Chatterjee, Sarmistha
Nifong, James C.
Aistrup, Joseph A.
[J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 695

← 1 2 3 →