Can ChatGPT-3.5 Pass a Medical Exam? A Systematic Review of ChatGPT's Performance in Academic Testing

被引：19

作者：

Sumbal, Anusha ^{[1
]}

Sumbal, Ramish ^{[1
]}

Amir, Alina ^{[1
]}

机构：

[1] Dow Univ Hlth Sci, Baba E Urdu Rd, Karachi 74200, Pakistan

来源：

JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT | 2024年 / 11卷

关键词：

ChatGPT; academic performance; medical education; artificial intelligence; digital health; medicine;

D O I：

10.1177/23821205241238641

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

OBJECTIVE We, therefore, aim to conduct a systematic review to assess the academic potential of ChatGPT-3.5, along with its strengths and limitations when giving medical exams.METHOD Following PRISMA guidelines, a systemic search of the literature was performed using electronic databases PUBMED/MEDLINE, Google Scholar, and Cochrane. Articles from their inception till April 4, 2023, were queried. A formal narrative analysis was conducted by systematically arranging similarities and differences between individual findings together.RESULTS After rigorous screening, 12 articles underwent this review. All the selected papers assessed the academic performance of ChatGPT-3.5. One study compared the performance of ChatGPT-3.5 with the performance of ChatGPT-4 when giving a medical exam. Overall, ChatGPT performed well in 4 tests, averaged in 4 tests, and performed badly in 4 tests. ChatGPT's performance was directly proportional to the level of the questions' difficulty but was unremarkable on whether the questions were binary, descriptive, or MCQ-based. ChatGPT's explanation, reasoning, memory, and accuracy were remarkably good, whereas it failed to understand image-based questions, and lacked insight and critical thinking.CONCLUSION ChatGPT-3.5 performed satisfactorily in the exams it took as an examinee. However, there is a need for future related studies to fully explore the potential of ChatGPT in medical education.

引用

页数：12

共 50 条

[21] Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks
Odabashian, Roupen
Bastin, Donald
Jones, Georden
Manzoor, Maria
Tangestaniapour, Sina
Assad, Malke
Lakhani, Sunita
Odabashian, Maritsa
Mcgee, Sharon
JMIR AI, 2024, 3
[22] ChatGPT's performance on JS']JSA-certified anesthesiologist exam
Kinoshita, Michiko
Komasaka, Mizuki
Tanaka, Katsuya
JOURNAL OF ANESTHESIA, 2024, 38 (02) : 282 - 283
[23] Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis
Levin, Gabriel
Horesh, Nir
Brezinov, Yoav
Meyer, Raanan
BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2024, 131 (03) : 378 - 380
[24] ChatGPT-3.5 and-4.0 and mechanical engineering: Examining performance on the FE mechanical engineering and undergraduate exams
Frenkel, Matthew E.
Emara, Hebah
COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2024, 32 (06)
[25] Can ChatGPT be the Plastic Surgeon's New Digital Assistant? A Bibliometric Analysis and Scoping Review of ChatGPT in Plastic Surgery Literature
Hilary Y. Liu
Mario Alessandri-Bonetti
José Antonio Arellano
Francesco M. Egro
Aesthetic Plastic Surgery, 2024, 48 : 1644 - 1652
[26] More human than human? Differences in lexis and collocation within academic essays produced by ChatGPT-3.5 and human L2 writers
Zhang, Mengxuan
Crosthwaite, Peter
IRAL-INTERNATIONAL REVIEW OF APPLIED LINGUISTICS IN LANGUAGE TEACHING, 2025,
[27] ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives
Keshavarz, Pedram
Bagherieh, Sara
Nabipoorashra, Seyed Ali
Chalian, Hamid
Rahsepar, Amir Ali
Kim, Grace Hyun J.
Hassani, Cameron
Raman, Steven S.
Bedayat, Arash
DIAGNOSTIC AND INTERVENTIONAL IMAGING, 2024, 105 (7-8) : 251 - 265
[28] Can ChatGPT be the Plastic Surgeon's New Digital Assistant? A Bibliometric Analysis and Scoping Review of ChatGPT in Plastic Surgery Literature
Liu, Hilary Y.
Alessandri-Bonetti, Mario
Arellano, Jose Antonio
Egro, Francesco M.
AESTHETIC PLASTIC SURGERY, 2024, 48 (08) : 1644 - 1652
[29] Assessing ChatGPT's ability to pass the FRCS orthopaedic part A exam: A critical analysis
Saad, Ahmed
Iyengar, Karthikeyan P.
Kurisunkal, Vineet
Botchu, Rajesh
SURGEON-JOURNAL OF THE ROYAL COLLEGES OF SURGEONS OF EDINBURGH AND IRELAND, 2023, 21 (05): : 263 - 266
[30] Accuracy and consistency of ChatGPT-3.5 and-4 in providing differential diagnoses in oral and maxillofacial diseases: a comparative diagnostic performance analysis
Tomo, Saygo
Lechien, Jerome R.
Bueno, Hugo Sobrinho
Cantieri-Debortoli, Daniela Filie
Simonato, Luciana Estevam
CLINICAL ORAL INVESTIGATIONS, 2024, 28 (10)

← 1 2 3 4 5 →