Experimental assessment of the performance of artificial intelligence in solving multiple-choice board exams in cardiology

被引:1
作者
Huwiler, Jessica [1 ,2 ]
Oechslin, Luca [1 ]
Biaggi, Patric [1 ,2 ]
Tanner, Felix C. [2 ,3 ]
Wyss, Christophe Alain [1 ,2 ,3 ]
机构
[1] Heart Clin Zurich, Zurich, Switzerland
[2] Univ Zurich, Zurich, Switzerland
[3] Swiss Soc Cardiol, Basel, Switzerland
关键词
EUROPEAN EXAM; CHATGPT; HEALTH;
D O I
10.57187/s.3547
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
AIMS: The aim of the present study was to evaluate the performance of various artificial intelligence (AI)-powered chatbots (commercially available in Switzerland up to June 2023) in solving a theoretical cardiology board exam and to compare their accuracy with that of human cardiology fellows. METHODS: For the study, a set of 88 multiple-choice cardiology exam questions was used. The participating cardiology fellows and selected chatbots were presented with these questions. The evaluation metrics included Top-1 and Top-2 accuracy, assessing the ability of chatbots and fellows to select the correct answer. RESULTS: Among the cardiology fellows, all 36 participants successfully passed the exam with a median accuracy of 98% (IQR 91-99%, range from 78% to 100%). However, the performance of the chatbots varied. Only one chatbot, Jasper quality, achieved the minimum pass rate of 73% correct answers. Most chatbots demonstrated a median Top-1 accuracy of 47% (IQR 44-53%, range from 42% to 73%), while Top-2 accuracy provided a modest improvement, resulting in a median accuracy of 67% (IQR 65-72%, range from 61% to 82%). Even with this advantage, only two chatbots, Jasper quality and ChatGPT plus 4.0, would have passed the exam. Similar results were observed when picture-based questions were excluded from the dataset. CONCLUSIONS: Overall, the study suggests that most current language-based chatbots have limitations in accurately solving theoretical medical board exams. In general, currently widely available chatbots fell short of achieving a passing score in a theoretical cardiology board exam. Nevertheless, a few showed promising results. Further improvements in artificial intelligence language models may lead to better performance in medical knowledge applications in the future.
引用
收藏
页数:8
相关论文
共 23 条
[1]  
[Anonymous], 2023, Non-Author Contributors, Defining the Role of Authors and Contributors
[2]  
Antaki F, 2023, medRxiv, DOI [10.1101/2023.01.22.23284882, DOI 10.1101/2023.01.22.23284882, 10.1101/2023.01.22.23284882]
[3]  
Benoit JRA, 2023, medRxiv, DOI [10.1101/2023.02.04.23285478, 10.1101/2023.02.04.23285478, DOI 10.1101/2023.02.04.23285478, DOI 10.1101/2023.02.04.23285478V1]
[4]   ChatGPT: five priorities for research [J].
Bockting, Claudi ;
van Dis, Eva A. M. ;
Bollen, Johan ;
van Rooij, Robert ;
Zuidema, Willem L. .
NATURE, 2023, 614 (7947) :224-226
[5]  
Bommarito J., 2023, arXiv, DOI DOI 10.48550/ARXIV.2301.04408
[6]  
Bommarito Michael, 2022, arXiv, DOI [10.48550/ARXIV.2212.14402, DOI 10.48550/ARXIV.2212.14402]
[7]   Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records [J].
Byrd, Roy J. ;
Steinhubl, Steven R. ;
Sun, Jimeng ;
Ebadollahi, Shahram ;
Stewart, Walter F. .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2014, 83 (12) :983-992
[8]   Chatting and cheating: Ensuring academic integrity in the era of ChatGPT [J].
Cotton, Debby R. E. ;
Cotton, Peter A. A. ;
Shipway, J. Reuben .
INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL, 2024, 61 (02) :228-239
[9]   Building Watson: An Overview of the DeepQA Project [J].
Ferrucci, David ;
Brown, Eric ;
Chu-Carroll, Jennifer ;
Fan, James ;
Gondek, David ;
Kalyanpur, Aditya A. ;
Lally, Adam ;
Murdock, J. William ;
Nyberg, Eric ;
Prager, John ;
Schlaefer, Nico ;
Welty, Chris .
AI MAGAZINE, 2010, 31 (03) :59-79
[10]   Can ChatGPT pass the life support exams without entering the American heart association course? [J].
Fijaoko, Nino ;
Gosak, Lucija ;
Stiglic, Gregor ;
Picard, Christopher T. ;
Douma, Matthew John .
RESUSCITATION, 2023, 185