The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists

被引:8
作者
Gunay, Serkan [1 ]
Ozturk, Ahmet [1 ]
Yigit, Yavuz [2 ]
机构
[1] Hitit Univ, Corum Erol Olcok Educ & Res Hosp, Dept Emergency Med, Emergency Med, Corum, Turkiye
[2] Hamad Gen Hosp, Dept Emergency Med, Emergency Med, Hamad Med Corp, Doha, Qatar
关键词
Artificial intelligence; ChatGPT; GPT-4; Gemini; Electrocardiography; GPT-4o; ARTIFICIAL-INTELLIGENCE;
D O I
10.1016/j.ajem.2024.07.043
中图分类号
R4 [临床医学];
学科分类号
1002 ; 100602 ;
摘要
Introduction: GPT-4, GPT-4o and Gemini advanced, which are among the well-known large language models (LLMs), have the capability to recognize and interpret visual data. When the literature is examined, there are a very limited number of studies examining the ECG performance of GPT-4. However, there is no study in the literature examining the success of Gemini and GPT-4o in ECG evaluation. The aim of our study is to evaluate the performance of GPT-4, GPT-4o, and Gemini in ECG evaluation, assess their usability in the medical field, and compare their accuracy rates in ECG interpretation with those of cardiologists and emergency medicine specialists. Methods: The study was conducted from May 14, 2024, to June 3, 2024. The book "150 ECG Cases" served as a reference, containing two sections: daily routine ECGs and more challenging ECGs. For this study, two emergency medicine specialists selected 20 ECG cases from each section, totaling 40 cases. In the next stage, the questions were evaluated by emergency medicine specialists and cardiologists. In the subsequent phase, a diagnostic question was entered daily into GPT-4, GPT-4o, and Gemini Advanced on separate chat interfaces. In the final phase, the responses provided by cardiologists, emergency medicine specialists, GPT-4, GPT-4o, and Gemini Advanced were statistically evaluated across three categories: routine daily ECGs, more challenging ECGs, and the total number of ECGs. Results: Cardiologists outperformed GPT-4, GPT-4o, and Gemini Advanced in all three groups. Emergency medicine specialists performed better than GPT-4o in routine daily ECG questions and total ECG questions (p = 0.003 and p = 0.042, respectively). When comparing GPT-4o with Gemini Advanced and GPT-4, GPT-4o performed better in total ECG questions (p = 0.027 and p < 0.001, respectively). In routine daily ECG questions, GPT-4o also outperformed Gemini Advanced (p = 0.004). Weak agreement was observed in the responses given by GPT-4 (p < 0.001, Fleiss Kappa = 0.265) and Gemini Advanced (p < 0.001, Fleiss Kappa = 0.347), while moderate agreement was observed in the responses given by GPT-4o (p < 0.001, Fleiss Kappa = 0.514). Conclusion: While GPT-4o shows promise, especially in more challenging ECG questions, and may have potential as an assistant for ECG evaluation, its performance in routine and overall assessments still lags behind human specialists. The limited accuracy and consistency of GPT-4 and Gemini suggest that their current use in clinical ECG interpretation is risky.
引用
收藏
页码:68 / 73
页数:6
相关论文
共 50 条
  • [21] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
    Farhat, Faiza
    Chaudhry, Beenish Moalla
    Nadeem, Mohammad
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    [J]. JMIR MEDICAL EDUCATION, 2024, 10
  • [22] Artificial Intelligence in Intensive Care Medicine: Toward a ChatGPT/GPT-4 Way?
    Yanqiu Lu
    Haiyang Wu
    Shaoyan Qi
    Kunming Cheng
    [J]. Annals of Biomedical Engineering, 2023, 51 : 1898 - 1903
  • [23] Automated Financial Analysis Using GPT-4
    Noels, Sander
    Merlevede, Adriaan
    Fecheyr, Andrew
    Vanhalst, Maarten
    Meerlaen, Nick
    Viaene, Sebastien
    De Bie, Tijl
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VII, 2023, 14175 : 345 - 349
  • [24] A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?
    Nakajima, Nozomu
    Fujimori, Takahito
    Furuya, Masayuki
    Kanie, Yuya
    Imai, Hirotatsu
    Kita, Kosuke
    Uemura, Keisuke
    Okada, Seiji
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)
  • [25] A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis
    Zhang, Junxiu
    Ma, Yao
    Zhang, Rong
    Chen, Yanhua
    Xu, Mengyao
    Su, Rina
    Ma, Ke
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [26] AI-ChatGPT/GPT-4: An Booster for the Development of Physical Medicine and Rehabilitation in the New Era!
    Peng, Shengxin
    Wang, Deqiang
    Liang, Yuanhao
    Xiao, Wenshan
    Zhang, Yixiang
    Liu, Lei
    [J]. ANNALS OF BIOMEDICAL ENGINEERING, 2024, 52 (03) : 462 - 466
  • [27] AI-ChatGPT/GPT-4: An Booster for the Development of Physical Medicine and Rehabilitation in the New Era!
    Shengxin Peng
    Deqiang Wang
    Yuanhao Liang
    Wenshan Xiao
    Yixiang Zhang
    Lei Liu
    [J]. Annals of Biomedical Engineering, 2024, 52 : 462 - 466
  • [28] Accuracy of GPT-4 in histopathological image detection and classification of colorectal adenomas
    Laohawetwanit, Thiyaphat
    Namboonlue, Chutimon
    Apornvirat, Sompon
    [J]. JOURNAL OF CLINICAL PATHOLOGY, 2024, : 202 - 207
  • [29] A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course
    Yeadon, Will
    Peach, Alex
    Testrow, Craig
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [30] GPT-4o vs. Human Candidates: Performance Analysis in the Polish Final Dentistry Examination
    Jaworski, Aleksander
    Jasinski, Dawid
    Slawinska, Barbara
    Blecha, Zuzanna
    Jaworski, Wojciech
    Kruplewicz, Maja
    Jasinska, Natalia
    Syslo, Oliwia
    Latkowska, Ada
    Jung, Magdalena
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (09)