Can ChatGPT-3.5 Pass a Medical Exam? A Systematic Review of ChatGPT's Performance in Academic Testing

被引：19

作者：

Sumbal, Anusha ^{[1
]}

Sumbal, Ramish ^{[1
]}

Amir, Alina ^{[1
]}

机构：

[1] Dow Univ Hlth Sci, Baba E Urdu Rd, Karachi 74200, Pakistan

来源：

JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT | 2024年 / 11卷

关键词：

ChatGPT; academic performance; medical education; artificial intelligence; digital health; medicine;

D O I：

10.1177/23821205241238641

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

OBJECTIVE We, therefore, aim to conduct a systematic review to assess the academic potential of ChatGPT-3.5, along with its strengths and limitations when giving medical exams.METHOD Following PRISMA guidelines, a systemic search of the literature was performed using electronic databases PUBMED/MEDLINE, Google Scholar, and Cochrane. Articles from their inception till April 4, 2023, were queried. A formal narrative analysis was conducted by systematically arranging similarities and differences between individual findings together.RESULTS After rigorous screening, 12 articles underwent this review. All the selected papers assessed the academic performance of ChatGPT-3.5. One study compared the performance of ChatGPT-3.5 with the performance of ChatGPT-4 when giving a medical exam. Overall, ChatGPT performed well in 4 tests, averaged in 4 tests, and performed badly in 4 tests. ChatGPT's performance was directly proportional to the level of the questions' difficulty but was unremarkable on whether the questions were binary, descriptive, or MCQ-based. ChatGPT's explanation, reasoning, memory, and accuracy were remarkably good, whereas it failed to understand image-based questions, and lacked insight and critical thinking.CONCLUSION ChatGPT-3.5 performed satisfactorily in the exams it took as an examinee. However, there is a need for future related studies to fully explore the potential of ChatGPT in medical education.

引用

页数：12

共 50 条

[31] A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research
Bagde, Hiroj
Dhopte, Ashwini
Alam, Mohammad Khursheed
Basri, Rehana
HELIYON, 2023, 9 (12)
[32] Below average ChatGPT performance in medical microbiology exam compared to university students
Sallam, Malik
Al-Salahat, Khaled
FRONTIERS IN EDUCATION, 2023, 8
[33] Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study
Rojas, Marcos
Rojas, Marcelo
Burgess, Valentina
Toro-Perez, Javier
Salehi, Shima
JMIR MEDICAL EDUCATION, 2024, 10
[34] Performance of ChatGPT Across Different Versions in MedicalLicensing Examinations Worldwide:Systematic Review andMeta-Analysis
Liu, Mingxin
Okuhara, Tsuyoshi
Chang, XinYi
Shirabe, Ritsuko
Nishiie, Yuriko
Okada, Hiroko
Kiuchi, Takahiro
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[35] ChatGPT's Performance in Spinal Metastasis Cases-Can We Discuss Our Complex Cases with ChatGPT?
Heisinger, Stephan
Salzmann, Stephan N.
Senker, Wolfgang
Aspalter, Stefan
Oberndorfer, Johannes
Matzner, Michael P.
Stienen, Martin N.
Motov, Stefan
Huber, Dominikus
Grohs, Josef Georg
JOURNAL OF CLINICAL MEDICINE, 2024, 13 (24)
[36] Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations
Alfertshofer, Michael
Hoch, Cosima C.
Funk, Paul F.
Hollmann, Katharina
Wollenberg, Barbara
Knoedler, Samuel
Knoedler, Leonard
ANNALS OF BIOMEDICAL ENGINEERING, 2024, 52 (06) : 1542 - 1545
[37] ChatGPT in medicine: A cross-disciplinary systematic review of ChatGPT's (artificial intelligence) role in research, clinical practice, education, and patient interaction
Fatima, Afia
Shafique, Muhammad Ashir
Alam, Khadija
Ahmed, Tagwa Kalool Fadlalla
Mustafa, Muhammad Saqlain
MEDICINE, 2024, 103 (32)
[38] Sailing the Seven Seas: A Multinational Comparison of ChatGPT’s Performance on Medical Licensing Examinations
Michael Alfertshofer
Cosima C. Hoch
Paul F. Funk
Katharina Hollmann
Barbara Wollenberg
Samuel Knoedler
Leonard Knoedler
Annals of Biomedical Engineering, 2024, 52 : 1542 - 1545
[39] Overview of Early ChatGPT?s Presence in Medical Literature: Insights From a Hybrid Literature Review by ChatGPT and Human Experts
Temsah, Omar
Khan, Samina A.
Chaiah, Yazan
Senjab, Abdulrahman
Alhasan, Khalid
Jamal, Amr
Aljamaan, Fadi
Malki, Khalid H.
Halwani, Rabih
Al-Tawfiq, Jaffar A.
Temsah, Mohamad-Hani
Al-Eyadhy, Ayman
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (04)
[40] Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis
Wei, Qiuhong
Yao, Zhengxiong
Cui, Ying
Wei, Bo
Jin, Zhezhen
Xu, Ximing
JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 151

← 1 2 3 4 5 →