Can ChatGPT-3.5 Pass a Medical Exam? A Systematic Review of ChatGPT's Performance in Academic Testing

被引：19

作者：

Sumbal, Anusha ^{[1
]}

Sumbal, Ramish ^{[1
]}

Amir, Alina ^{[1
]}

机构：

[1] Dow Univ Hlth Sci, Baba E Urdu Rd, Karachi 74200, Pakistan

来源：

JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT | 2024年 / 11卷

关键词：

ChatGPT; academic performance; medical education; artificial intelligence; digital health; medicine;

D O I：

10.1177/23821205241238641

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

OBJECTIVE We, therefore, aim to conduct a systematic review to assess the academic potential of ChatGPT-3.5, along with its strengths and limitations when giving medical exams.METHOD Following PRISMA guidelines, a systemic search of the literature was performed using electronic databases PUBMED/MEDLINE, Google Scholar, and Cochrane. Articles from their inception till April 4, 2023, were queried. A formal narrative analysis was conducted by systematically arranging similarities and differences between individual findings together.RESULTS After rigorous screening, 12 articles underwent this review. All the selected papers assessed the academic performance of ChatGPT-3.5. One study compared the performance of ChatGPT-3.5 with the performance of ChatGPT-4 when giving a medical exam. Overall, ChatGPT performed well in 4 tests, averaged in 4 tests, and performed badly in 4 tests. ChatGPT's performance was directly proportional to the level of the questions' difficulty but was unremarkable on whether the questions were binary, descriptive, or MCQ-based. ChatGPT's explanation, reasoning, memory, and accuracy were remarkably good, whereas it failed to understand image-based questions, and lacked insight and critical thinking.CONCLUSION ChatGPT-3.5 performed satisfactorily in the exams it took as an examinee. However, there is a need for future related studies to fully explore the potential of ChatGPT in medical education.

引用

页数：12

共 50 条

[41] Methodological insights into ChatGPT’s screening performance in systematic reviews
Mahbod Issaiy
Hossein Ghanaati
Shahriar Kolahi
Madjid Shakiba
Amir Hossein Jalali
Diana Zarei
Sina Kazemian
Mahsa Alborzi Avanaki
Kavous Firouznia
BMC Medical Research Methodology, 24
[42] Assessing ChatGPT’s orthopedic in-service training exam performance and applicability in the field
Neil Jain
Caleb Gottlich
John Fisher
Dominic Campano
Travis Winston
Journal of Orthopaedic Surgery and Research, 19
[43] Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field
Jain, Neil
Gottlich, Caleb
Fisher, John
Campano, Dominic
Winston, Travis
JOURNAL OF ORTHOPAEDIC SURGERY AND RESEARCH, 2024, 19 (01)
[44] Methodological insights into ChatGPT's screening performance in systematic reviews
Issaiy, Mahbod
Ghanaati, Hossein
Kolahi, Shahriar
Shakiba, Madjid
Jalali, Amir Hossein
Zarei, Diana
Kazemian, Sina
Avanaki, Mahsa Alborzi
Firouznia, Kavous
BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
[45] Transforming education with AI: A systematic review of ChatGPT's role in learning, academic practices, and institutional adoption
Salih, Sayeed
Husain, Omayma
Hamdan, Mosab
Abdelsalam, Samah
Elshafie, Hashim
Motwakel, Abdelwahed
RESULTS IN ENGINEERING, 2025, 25
[46] The intent of ChatGPT usage and its robustness in medical proficiency exams: a systematic review
Tatiana Chaiban
Zeinab Nahle
Ghaith Assi
Michelle Cherfane
Discover Education, 3 (1):
[47] From GPT-3.5 to GPT-4.o: A Leap in AI's Medical Exam Performance
Kipp, Markus
INFORMATION, 2024, 15 (09)
[48] Evaluating ChatGPT-4 in medical education: an assessment of subject exam performance reveals limitations in clinical curriculum support for students
Mackey B.P.
Garabet R.
Maule L.
Tadesse A.
Cross J.
Weingarten M.
Discover Artificial Intelligence, 2024, 4 (01):
[49] Performance of ChatGPT 3.5 and 4 on U.S. dental examinations: the INBDE, ADAT, and DAT
Dashti, Mahmood
Ghasemi, Shohreh
Ghadimi, Niloofar
Hefzi, Delband
Karimian, Azizeh
Zare, Niusha
Fahimipour, Amir
Khurshid, Zohaib
Chafjiri, Maryam Mohammadalizadeh
Ghaedsharaf, Sahar
IMAGING SCIENCE IN DENTISTRY, 2024, 54 (03) : 271 - 275
[50] Can ChatGPT-4o provide new systematic review ideas to oral and maxillofacial surgeons?
Balel, Yunus
Zogo, Atakan
Yildiz, Serkan
Tanyeri, Hakki
JOURNAL OF STOMATOLOGY ORAL AND MAXILLOFACIAL SURGERY, 2024, 125 (05)

← 1 2 3 4 5 →