ChatGPT Performs Worse on USMLE-Style Ethics Questions Compared to Medical Knowledge Questions
被引:0
|
作者:
Danehy, Tessa
论文数: 0引用数: 0
h-index: 0
机构:
Albert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USAAlbert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USA
Danehy, Tessa
[1
]
Hecht, Jessica
论文数: 0引用数: 0
h-index: 0
机构:
Albert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USAAlbert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USA
Hecht, Jessica
[1
]
Kentis, Sabrina
论文数: 0引用数: 0
h-index: 0
机构:
Albert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USAAlbert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USA
Kentis, Sabrina
[1
]
Schechter, Clyde B.
论文数: 0引用数: 0
h-index: 0
机构:
Albert Einstein Coll Med, Dept Family & Social Med, Bronx, NY USAAlbert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USA
Schechter, Clyde B.
[2
]
Jariwala, Sunit P.
论文数: 0引用数: 0
h-index: 0
机构:
Albert Einstein Coll Med, Div Allergy Immunol, Montefiore Med Ctr, Bronx, NY USAAlbert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USA
Jariwala, Sunit P.
[3
]
机构:
[1] Albert Einstein Coll Med, Montefiore Med Ctr, Bronx, NY 10461 USA
[2] Albert Einstein Coll Med, Dept Family & Social Med, Bronx, NY USA
[3] Albert Einstein Coll Med, Div Allergy Immunol, Montefiore Med Ctr, Bronx, NY USA
来源:
APPLIED CLINICAL INFORMATICS
|
2024年
/
15卷
/
05期
关键词:
ChatGPT;
large language model;
artificial intelligence;
medical education;
USMLE;
ethics;
D O I:
10.1055/a-2405-0138
中图分类号:
R-058 [];
学科分类号:
摘要:
Objectives The main objective of this study is to evaluate the ability of the Large Language Model Chat Generative Pre-Trained Transformer (ChatGPT) to accurately answer the United States Medical Licensing Examination (USMLE) board-style medical ethics questions compared to medical knowledge-based questions. This study has the additional objectives of comparing the overall accuracy of GPT-3.5 to GPT-4 and assessing the variability of responses given by each version. Methods Using AMBOSS, a third-party USMLE Step Exam test prep service, we selected one group of 27 medical ethics questions and a second group of 27 medical knowledge questions matched on question difficulty for medical students. We ran 30 trials asking these questions on GPT-3.5 and GPT-4 and recorded the output. A random-effects linear probability regression model evaluated accuracy and a Shannon entropy calculation evaluated response variation. Results Both versions of ChatGPT demonstrated worse performance on medical ethics questions compared to medical knowledge questions. GPT-4 performed 18% points ( p < 0.05) worse on medical ethics questions compared to medical knowledge questions and GPT-3.5 performed 7% points ( p = 0.41) worse. GPT-4 outperformed GPT-3.5 by 22% points ( p < 0.001) on medical ethics and 33% points ( p < 0.001) on medical knowledge. GPT-4 also exhibited an overall lower Shannon entropy for medical ethics and medical knowledge questions (0.21 and 0.11, respectively) than GPT-3.5 (0.59 and 0.55, respectively) which indicates lower variability in response. Conclusion Both versions of ChatGPT performed more poorly on medical ethics questions compared to medical knowledge questions. GPT-4 significantly outperformed GPT-3.5 on overall accuracy and exhibited a significantly lower response variability in answer choices. This underscores the need for ongoing assessment of ChatGPT versions for medical education.
机构:
Mt Sinai Hlth Syst, New York, NY 10017 USAMt Sinai Hlth Syst, New York, NY 10017 USA
Patel, Dhavalkumar
Raut, Ganesh
论文数: 0引用数: 0
h-index: 0
机构:
Mt Sinai Hlth Syst, New York, NY 10017 USAMt Sinai Hlth Syst, New York, NY 10017 USA
Raut, Ganesh
Zimlichman, Eyal
论文数: 0引用数: 0
h-index: 0
机构:
Tel Aviv Univ, Sheba Med Ctr, Hosp Management, Tel Aviv, Israel
Tel Aviv Univ, Sheba Med Ctr, ARC Innovat Ctr, Tel Aviv, IsraelMt Sinai Hlth Syst, New York, NY 10017 USA
Zimlichman, Eyal
Cheetirala, Satya Narayan
论文数: 0引用数: 0
h-index: 0
机构:
Mt Sinai Hlth Syst, New York, NY 10017 USAMt Sinai Hlth Syst, New York, NY 10017 USA
Cheetirala, Satya Narayan
Nadkarni, Girish N.
论文数: 0引用数: 0
h-index: 0
机构:
Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY 10029 USAMt Sinai Hlth Syst, New York, NY 10017 USA
Nadkarni, Girish N.
Glicksberg, Benjamin S.
论文数: 0引用数: 0
h-index: 0
机构:
Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY 10029 USAMt Sinai Hlth Syst, New York, NY 10017 USA
Glicksberg, Benjamin S.
Apakama, Donald U.
论文数: 0引用数: 0
h-index: 0
机构:
Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY 10029 USAMt Sinai Hlth Syst, New York, NY 10017 USA
Apakama, Donald U.
Bell, Elijah J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Los Angeles, Los Angeles, CA USAMt Sinai Hlth Syst, New York, NY 10017 USA
Bell, Elijah J.
Freeman, Robert
论文数: 0引用数: 0
h-index: 0
机构:
Mt Sinai Hlth Syst, New York, NY 10017 USAMt Sinai Hlth Syst, New York, NY 10017 USA
Freeman, Robert
Timsina, Prem
论文数: 0引用数: 0
h-index: 0
机构:
Mt Sinai Hlth Syst, New York, NY 10017 USAMt Sinai Hlth Syst, New York, NY 10017 USA
Timsina, Prem
Klang, Eyal
论文数: 0引用数: 0
h-index: 0
机构:
Tel Aviv Univ, Sheba Med Ctr, ARC Innovat Ctr, Tel Aviv, Israel
Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY 10029 USAMt Sinai Hlth Syst, New York, NY 10017 USA
机构:
Gazi Univ, Fac Med, Dept Med Educ & Informat, TR-06500 Ankara, Turkiye
Gazi Univ Hastanesi, Dept Med Educ & Informat, E Blok 9 Kat, TR-06500 Ankara, TurkiyeGazi Univ, Fac Med, Dept Med Educ & Informat, TR-06500 Ankara, Turkiye
Kiyak, Yavuz Selim
Emekli, Emre
论文数: 0引用数: 0
h-index: 0
机构:
Eskisehir Osmangazi Univ, Fac Med, Dept Radiol, TR-26040 Eskisehir, TurkiyeGazi Univ, Fac Med, Dept Med Educ & Informat, TR-06500 Ankara, Turkiye
机构:
Khalifa Univ, Ctr Cyber Phys Syst, Elect Engn & Comp Sci Dept, POB 127788, Abu Dhabi, U Arab EmiratesKhalifa Univ, Ctr Cyber Phys Syst, Elect Engn & Comp Sci Dept, POB 127788, Abu Dhabi, U Arab Emirates