Performance of Chat Generative Pre-trained Transformer-4o in the Adult Clinical Cardiology Self-Assessment Program

被引：1

作者：

Malik, Abdulaziz ^{[1
]}

Madias, Christopher ^{[1
]}

Wessler, Benjamin S. ^{[1
]}

机构：

[1] Tufts Med Ctr, Cardiovasc Ctr, 800 Washington St, Boston, MA 02111 USA

来源：

EUROPEAN HEART JOURNAL - DIGITAL HEALTH | 2024年 / 6卷 / 01期

关键词：

Medical education; Artificial intelligence; Large language models;

D O I：

10.1093/ehjdh/ztae077

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Aims This study evaluates the performance of OpenAI's latest large language model (LLM), Chat Generative Pre-trained Transformer-4o, on the Adult Clinical Cardiology Self-Assessment Program (ACCSAP). Methods and results Chat Generative Pre-trained Transformer-4o was tested on 639 ACCSAP questions, excluding 45 questions containing video clips, resulting in 594 questions for analysis. The questions included a mix of text-based and static image-based [electrocardiogram (ECG), angiogram, computed tomography (CT) scan, and echocardiogram] formats. The model was allowed one attempt per question. Further evaluation of image-only questions was performed on 25 questions from the database. Chat Generative Pre-trained Transformer-4o correctly answered 69.2% (411/594) of the questions. The performance was higher for text-only questions (73.9%) compared with those requiring image interpretation (55.3%, P < 0.001). The model performed worse on questions involving ECGs, with a correct rate of 56.5% compared with 73.3% for non-ECG questions (P < 0.001). Despite its capability to interpret medical images in the context of a text-based question, the model's accuracy varied, demonstrating strengths and notable gaps in diagnostic accuracy. It lacked accuracy in reading images (ECGs, echocardiography, and angiograms) with no context. Conclusion Chat Generative Pre-trained Transformer-4o performed moderately well on ACCSAP questions. However, the model's performance remains inconsistent, especially in interpreting ECGs. These findings highlight the potential and current limitations of using LLMs in medical education and clinical decision-making.

引用

页码：155 / 158

页数：4

共 35 条

[31] Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam
Buldur, Mehmet
Sezer, Berkant
BMC ORAL HEALTH, 2024, 24 (01):
[32] Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists
Jakub Pristoupil
Laura Oleaga
Vanesa Junquero
Cristina Merino
Ozbek Suha Sureyya
Martin Kyncl
Andrea Burgetova
Lukas Lambert
Insights into Imaging, 16 (1)
[33] GPT4MIA: Utilizing Generative Pre-trained Transformer (GPT-3) as a Plug-and-Play Transductive Model for Medical Image Analysis
Zhang, Yizhe
Chen, Danny Z.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 151 - 160
[34] Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection
Ma, Huaiyuan
Ma, Xingbin
Yang, Chunxiao
Niu, Qiong
Gao, Tao
Liu, Chengxia
Chen, Yan
SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (03): : 1264 - 1272
[35] Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection
Huaiyuan Ma
Xingbin Ma
Chunxiao Yang
Qiong Niu
Tao Gao
Chengxia Liu
Yan Chen
Surgical Endoscopy, 2024, 38 : 1264 - 1272

← 1 2 3 4 →