Performance of Chat Generative Pre-trained Transformer-4o in the Adult Clinical Cardiology Self-Assessment Program

被引:1
|
作者
Malik, Abdulaziz [1 ]
Madias, Christopher [1 ]
Wessler, Benjamin S. [1 ]
机构
[1] Tufts Med Ctr, Cardiovasc Ctr, 800 Washington St, Boston, MA 02111 USA
来源
EUROPEAN HEART JOURNAL - DIGITAL HEALTH | 2024年 / 6卷 / 01期
关键词
Medical education; Artificial intelligence; Large language models;
D O I
10.1093/ehjdh/ztae077
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Aims This study evaluates the performance of OpenAI's latest large language model (LLM), Chat Generative Pre-trained Transformer-4o, on the Adult Clinical Cardiology Self-Assessment Program (ACCSAP). Methods and results Chat Generative Pre-trained Transformer-4o was tested on 639 ACCSAP questions, excluding 45 questions containing video clips, resulting in 594 questions for analysis. The questions included a mix of text-based and static image-based [electrocardiogram (ECG), angiogram, computed tomography (CT) scan, and echocardiogram] formats. The model was allowed one attempt per question. Further evaluation of image-only questions was performed on 25 questions from the database. Chat Generative Pre-trained Transformer-4o correctly answered 69.2% (411/594) of the questions. The performance was higher for text-only questions (73.9%) compared with those requiring image interpretation (55.3%, P < 0.001). The model performed worse on questions involving ECGs, with a correct rate of 56.5% compared with 73.3% for non-ECG questions (P < 0.001). Despite its capability to interpret medical images in the context of a text-based question, the model's accuracy varied, demonstrating strengths and notable gaps in diagnostic accuracy. It lacked accuracy in reading images (ECGs, echocardiography, and angiograms) with no context. Conclusion Chat Generative Pre-trained Transformer-4o performed moderately well on ACCSAP questions. However, the model's performance remains inconsistent, especially in interpreting ECGs. These findings highlight the potential and current limitations of using LLMs in medical education and clinical decision-making.
引用
收藏
页码:155 / 158
页数:4
相关论文
共 35 条
  • [31] Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam
    Buldur, Mehmet
    Sezer, Berkant
    BMC ORAL HEALTH, 2024, 24 (01):
  • [32] Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists
    Jakub Pristoupil
    Laura Oleaga
    Vanesa Junquero
    Cristina Merino
    Ozbek Suha Sureyya
    Martin Kyncl
    Andrea Burgetova
    Lukas Lambert
    Insights into Imaging, 16 (1)
  • [33] GPT4MIA: Utilizing Generative Pre-trained Transformer (GPT-3) as a Plug-and-Play Transductive Model for Medical Image Analysis
    Zhang, Yizhe
    Chen, Danny Z.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 151 - 160
  • [34] Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection
    Ma, Huaiyuan
    Ma, Xingbin
    Yang, Chunxiao
    Niu, Qiong
    Gao, Tao
    Liu, Chengxia
    Chen, Yan
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (03): : 1264 - 1272
  • [35] Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection
    Huaiyuan Ma
    Xingbin Ma
    Chunxiao Yang
    Qiong Niu
    Tao Gao
    Chengxia Liu
    Yan Chen
    Surgical Endoscopy, 2024, 38 : 1264 - 1272