GPT-4 as a Clinical Decision Support Tool in Ischemic Stroke Management: Evaluation Study

被引:0
作者
Shmilovitch, Amit Haim [1 ]
Katson, Mark [1 ]
Cohen-Shelly, Michal [2 ]
Peretz, Shlomi [3 ,4 ]
Aran, Dvir [5 ,6 ]
Shelly, Shahar [1 ,7 ]
机构
[1] Rambam Med Ctr, Dept Neurol, HaAliya HaShniya St 8,POB 9602, IL-3109601 Haifa, Israel
[2] Chaim Sheba Med Ctr, ARC Innovat Ctr, Sagol AI Hub, Ramat Gan, Israel
[3] Shamir Med Ctr, Dept Neurol, Beer Yaagov, Israel
[4] Tel Aviv Univ, Sackler Sch Med, Tel Aviv, Israel
[5] Technion Israel Inst Technol, Fac Biol, Haifa, Israel
[6] Technion Israel Inst Technol, Taub Fac Comp Sci, Haifa, Israel
[7] Technion Israel Inst Technol, Rapaport Fac Med, Haifa, Israel
来源
JMIR AI | 2025年 / 4卷
关键词
GPT-4; ischemic stroke; clinical decision support; artificial intelligence; neurology; AI;
D O I
10.2196/60391
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Cerebrovascular diseases are the second most common cause of death worldwide and one of the major causes of disability burden. Advancements in artificial intelligence have the potential to revolutionize health care delivery, particularly in critical decision-making scenarios such as ischemic stroke management. Objective: This study aims to evaluate the effectiveness of GPT-4 in providing clinical support for emergency department neurologists by comparing its recommendations with expert opinions and real-world outcomes in acute ischemic stroke management. Methods: A cohort of 100 patients with acute stroke symptoms was retrospectively reviewed. Data used for decision-making included patients' history, clinical evaluation, imaging study results, and other relevant details. Each case was independently presented to GPT-4, which provided scaled recommendations (1-7) regarding the appropriateness of treatment, the use of tissue plasminogen activator, and the need for endovascular thrombectomy. Additionally, GPT-4 estimated the 90-day mortality probability for each patient and elucidated its reasoning for each recommendation. The recommendations were then compared with a stroke specialist's opinion and actual treatment decisions. Results: In our cohort of 100 patients, treatment recommendations by GPT-4 showed strong agreement with expert opinion (area under the curve [AUC] 0.85, 95% CI 0.77-0.93) and real-world treatment decisions (AUC 0.80, 95% CI 0.69-0.91). GPT-4 showed near-perfect agreement with real-world decisions in recommending endovascular thrombectomy (AUC 0.94, 95% CI 0.89-0.98) and strong agreement for tissue plasminogen activator treatment (AUC 0.77, 95% CI 0.68-0.86). Notably, in some cases, GPT-4 recommended more aggressive treatment than human experts, with 11 instances where GPT-4 suggested tissue plasminogen activator use against expert opinion. For mortality prediction, GPT-4 accurately identified 10 (77%) out of 13 deaths within its top 25 high-risk predictions (AUC 0.89, 95% CI 0.8077-0.9739; hazard ratio 6.98, 95% CI 2.88-16.9; P<.001), outperforming supervised machine learning models such as PRACTICE (AUC 0.70; log-rank P=.02) and PREMISE (AUC 0.77; P=.07). Conclusions: This study demonstrates the potential of GPT-4 as a viable clinical decision-support tool in the management of acute stroke. Its ability to provide explainable recommendations without requiring structured data input aligns well with the routine workflows of treating physicians. However, the tendency toward more aggressive treatment recommendations highlights the importance of human oversight in clinical decision-making. Future studies should focus on prospective validations and exploring the safe integration of such artificial intelligence tools into clinical practice.
引用
收藏
页数:11
相关论文
共 29 条
  • [1] Diagnosis, Workup, Risk Reduction of Transient Ischemic Attack in the Emergency Department Setting: A Scientific Statement From the American Heart Association
    Amin, Hardik P.
    Madsen, Tracy E.
    Bravata, Dawn M.
    Wira, Charles R.
    Johnston, S. Claiborne
    Ashcraft, Susan
    Burrus, Tamika M.
    Panagos, Peter D.
    Wintermark, Max
    Esenwa, Charles
    [J]. STROKE, 2023, 54 (03) : E109 - E121
  • [2] Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments
    Brin, Dana
    Sorin, Vera
    Vaid, Akhil
    Soroush, Ali
    Glicksberg, Benjamin S.
    Charney, Alexander W.
    Nadkarni, Girish
    Klang, Eyal
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [3] Benefits and Risks of Dual Versus Single Antiplatelet Therapy for Secondary Stroke Prevention: A Systematic Review for the 2021 Guideline for the Prevention of Stroke in Patients With Stroke and Transient Ischemic Attack
    Brown, Devin L.
    Levine, Deborah A.
    Albright, Karen
    Kapral, Moira K.
    Leung, Lester Y.
    Reeves, Mathew J.
    Sico, Jason
    Strong, Brent
    Whiteley, William N.
    [J]. STROKE, 2021, 52 (07) : E468 - E479
  • [4] Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases
    Chiu, Wan Hang Keith
    Ko, Wei Sum Koel
    Cho, William Chi Shing
    Hui, Sin Yu Joanne
    Chan, Wing Chi Lawrence
    Kuo, Michael D.
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [5] Global and regional burden of stroke during 1990-2010: findings from the Global Burden of Disease Study 2010
    Feigin, Valery L.
    Forouzanfar, Mohammad H.
    Krishnamurthi, Rita
    Mensah, George A.
    Connor, Myles
    Bennett, Derrick A.
    Moran, Andrew E.
    Sacco, Ralph L.
    Anderson, Laurie
    Truelsen, Thomas
    O'Donnell, Martin
    Venketasubramanian, Narayanaswamy
    Barker-Collo, Suzanne
    Lawes, Carlene M. M.
    Wang, Wenzhi
    Shinohara, Yukito
    Witt, Emma
    Ezzati, Majid
    Naghavi, Mohsen
    Murray, Christopher
    [J]. LANCET, 2014, 383 (9913) : 245 - 255
  • [6] Predicting Early Mortality of Acute Ischemic Stroke Score-Based Approach
    Gattringer, Thomas
    Posekany, Alexandra
    Niederkorn, Kurt
    Knoflach, Michael
    Poltrum, Birgit
    Mutzenbach, Sebastian
    Haring, Hans-Peter
    Ferrari, Julia
    Lang, Wilfried
    Willeit, Johann
    Kiechl, Stefan
    Enzinger, Christian
    Fazekas, Franz
    [J]. STROKE, 2019, 50 (02) : 349 - 356
  • [7] GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions
    Guerra, Gage A.
    Hofmann, Hayden
    Sobhani, Sina
    Hofmann, Grady
    Gomez, David
    Soroudi, Daniel
    Hopkins, Benjamin S.
    Dallas, Jonathan
    Pangal, Dhiraj J.
    Cheok, Stephanie
    Nguyen, Vincent N.
    Mack, William J.
    Zada, Gabriel
    [J]. WORLD NEUROSURGERY, 2023, 179 : E160 - E165
  • [8] Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine
    Guillen-Grima, Francisco
    Guillen-Aguinaga, Sara
    Guillen-Aguinaga, Laura
    Alas-Brun, Rosa
    Onambele, Luc
    Ortega, Wilfrido
    Montejo, Rocio
    Aguinaga-Ontoso, Enrique
    Barach, Paul
    Aguinaga-Ontoso, Ines
    [J]. CLINICS AND PRACTICE, 2023, 13 (06) : 1460 - 1487
  • [9] Identifying Patients at High Risk for Poor Outcome After Intra-Arterial Therapy for Acute Ischemic Stroke
    Hallevi, Hen
    Barreto, Andrew D.
    Liebeskind, David S.
    Morales, Miriam M.
    Martin-Schild, Sheryl B.
    Abraham, Anitha T.
    Gadia, Jignesh
    Saver, Jeffrey L.
    Grotta, James C.
    Savitz, Sean I.
    [J]. STROKE, 2009, 40 (05) : 1780 - 1785
  • [10] Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge
    Kanjee, Zahir
    Crowe, Byron
    Rodman, Adam
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2023, 330 (01): : 78 - 80