Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations

被引：33

作者：

Alfertshofer, Michael ^{[1
]}

Hoch, Cosima C. ^{[2
]}

Funk, Paul F. ^{[3
]}

Hollmann, Katharina ^{[4
]}

Wollenberg, Barbara ^{[2
]}

Knoedler, Samuel ^{[5
]}

Knoedler, Leonard ^{[5
]}

机构：

[1] Ludwig Maximilians Univ Munchen, Div Hand Plast & Aesthet Surg, Ziemssenstr 5, D-80336 Munich, Germany

[2] Tech Univ Munich, Sch Med, Dept Otolaryngol Head & Neck Surg, Ismaningerstr 22, D-81675 Munich, Germany

[3] Friedrich Schiller Univ Jena, Univ Hosp Jena, Dept Otolaryngol Head & Neck Surg, Klinikum 1, D-07747 Jena, Germany

[4] Harvard Med Sch, Massachusetts Gen Hosp, Dept Pathol, 55 Fruit St, Boston, MA 02114 USA

[5] Univ Hosp Regensburg, Dept Plast Hand & Reconstruct Surg, Franz Josef Str Allee 11, D-93053 Regensburg, Germany

来源：

ANNALS OF BIOMEDICAL ENGINEERING | 2024年 / 52卷 / 06期

关键词：

ChatGPT; OpenAI; Artificial intelligence; Medical education; Clinical decision-making; Medical licensing exams;

D O I：

10.1007/s10439-023-03338-3

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Purpose The use of AI-powered technology, particularly OpenAI's ChatGPT, holds significant potential to reshape healthcare and medical education. Despite existing studies on the performance of ChatGPT in medical licensing examinations across different nations, a comprehensive, multinational analysis using rigorous methodology is currently lacking. Our study sought to address this gap by evaluating the performance of ChatGPT on six different national medical licensing exams and investigating the relationship between test question length and ChatGPT's accuracy.Methods We manually inputted a total of 1,800 test questions (300 each from US, Italian, French, Spanish, UK, and Indian medical licensing examination) into ChatGPT, and recorded the accuracy of its responses.Results We found significant variance in ChatGPT's test accuracy across different countries, with the highest accuracy seen in the Italian examination (73% correct answers) and the lowest in the French examination (22% correct answers). Interestingly, question length correlated with ChatGPT's performance in the Italian and French state examinations only. In addition, the study revealed that questions requiring multiple correct answers, as seen in the French examination, posed a greater challenge to ChatGPT.Conclusion Our findings underscore the need for future research to further delineate ChatGPT's strengths and limitations in medical test-taking across additional countries and to develop guidelines to prevent AI-assisted cheating in medical examinations.

引用

页码：1542 / 1545

页数：4

共 7 条

[1] Artificial Intelligence-Enabled Evaluation of Pain Sketches to Predict Outcomes in Headache Surgery [J].

Chartier, Christian ;

Gfrerer, Lisa ;

Knoedler, Leonard ;

Austen, William G., Jr. .

PLASTIC AND RECONSTRUCTIVE SURGERY, 2023, 151 (02) :405-411

[2] ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions [J].

Hoch, Cosima C. ;

Wollenberg, Barbara ;

Lueers, Jan-Christoffer ;

Knoedler, Samuel ;

Knoedler, Leonard ;

Frank, Konstantin ;

Cotofana, Sebastian ;

Alfertshofer, Michael .

EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2023, 280 (09) :4271-4278

[3] ChatGPT Passes German State Examination in Medicine With Picture Questions Omitted [J].

Jung, Leonard B. ;

Gudera, Jonas A. ;

Wiegand, Tim L. T. ;

Allmendinger, Simeon ;

Dimitriadis, Konstantinos ;

Koerte, Inga K. .

DEUTSCHES ARZTEBLATT INTERNATIONAL, 2023, 120 (21-22) :373-374

[4]

Kasai J., 2023, ARXIV

[5] Artificial intelligence-enabled simulation of gluteal augmentation: A helpful tool in preoperative outcome simulation? [J].

Knoedler, Leonard ;

Odenthal, Jan ;

Prantl, Lukas ;

Oezdemir, Berkin ;

Kehrer, Andreas ;

Kauke-Navarro, Martin ;

Matar, Dany Y. ;

Obed, Doha ;

Panayi, Adriana C. ;

Broer, P. Niclas ;

Chartier, Christian ;

Knoedler, Samuel .

JOURNAL OF PLASTIC RECONSTRUCTIVE AND AESTHETIC SURGERY, 2023, 80 :94-101

[6] A Ready-to-Use Grading Tool for Facial Palsy Examiners-Automated Grading System in Facial Palsy Patients Made Easy [J].

Knoedler, Leonard ;

Miragall, Maximilian ;

Kauke-Navarro, Martin ;

Obed, Doha ;

Bauer, Maximilian ;

Tissler, Patrick ;

Prantl, Lukas ;

Machens, Hans-Guenther ;

Broer, Peter Niclas ;

Baecher, Helena ;

Panayi, Adriana C. ;

Knoedler, Samuel ;

Kehrer, Andreas .

JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (10)

[7]

Wu J., 2023, ARXIV

← 1 →