The intent of ChatGPT usage and its robustness in medical proficiency exams: a systematic review

被引：0

作者：

Tatiana Chaiban ^{[1
]}

Zeinab Nahle ^{[2
]}

Ghaith Assi ^{[2
]}

Michelle Cherfane ^{[2
]}

机构：

[1] Department of Social and Education Sciences, School of Arts and Sciences, Lebanese American University, Beirut

[2] Gilbert and Rose-Marie Chagoury School of Medicine, Lebanese American University, P.O. Box 36, Byblos

[3] INSPECT-LB (Institut National de Santé Publique, d’Épidémiologie Clinique Et de Toxicologie-Liban), Beirut

来源：

Discover Education | / 3卷 / 1期

关键词：

ChatGPT; Subspecialties; Written medical examinations;

D O I：

10.1007/s44217-024-00332-2

中图分类号：

学科分类号：

摘要：

Background: Since it was first launched, ChatGPT, a Large Language Model (LLM), has been widely used across different disciplines, particularly the medical field. Objective: The main aim of this review is to thoroughly assess the performance of the distinct version of ChatGPT in subspecialty written medical proficiency exams and the factors that impact it. Methods: Distinct online databases were searched for appropriate articles that fit the intended objectives of the study: PubMed, CINAHL, and Web of Science. A group of reviewers was assembled to create an appropriate methodology framework for the articles to be included. Results: 16 articles were adopted for this review that assessed the performance of different ChatGPT versions across different subspecialty written examinations, such as surgery, neurology, orthopedics, trauma and orthopedics, core cardiology, family medicine, and dermatology. The studies reported different passing grades and rankings with distinct accuracy rates, ranging from 35.8% to 91%, across different datasets and subspecialties. Some of the factors that were highlighted as impacting its correctness were the following: (1) ChatGPT distinct versions; (2) medical subspecialties; (3) types of questions; (4) language; and (5) comparators. Conclusions: This review indicates ChatGPT’s performance on the different medical specialty examinations and poses potential research to investigate whether ChatGPT can enhance the learning and support medical students taking a range of medical specialty exams. However, to avoid exploitation and any detrimental effects on the real world of medicine, it is crucial to be aware of its limitations and improve the ongoing evaluation of this AI tool. © The Author(s) 2024.

引用

共 6 条

[1] Systematic review of ChatGPT accuracy and performance in Iran's medical licensing exams: A brief report
Keshtkar, Alireza
Atighi, Farnaz
Reihani, Hamid
JOURNAL OF EDUCATION AND HEALTH PROMOTION, 2024, 13 (01)
[2] A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research
Bagde, Hiroj
Dhopte, Ashwini
Alam, Mohammad Khursheed
Basri, Rehana
HELIYON, 2023, 9 (12)
[3] Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis
Levin, Gabriel
Horesh, Nir
Brezinov, Yoav
Meyer, Raanan
BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2024, 131 (03) : 378 - 380
[4] Can ChatGPT-3.5 Pass a Medical Exam? A Systematic Review of ChatGPT's Performance in Academic Testing
Sumbal, Anusha
Sumbal, Ramish
Amir, Alina
JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT, 2024, 11
[5] Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis
Wei, Qiuhong
Yao, Zhengxiong
Cui, Ying
Wei, Bo
Jin, Zhezhen
Xu, Ximing
JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 151
[6] ChatGPT integration within nursing education and its implications for nursing students: A systematic review and text network analysis
Gunawan, Joko
Aungsuroch, Yupin
Montayre, Jed
NURSE EDUCATION TODAY, 2024, 141

← 1 →