Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain

被引：3

作者：

Muluk, Selkin Yilmaz ^{[1
]}

Olcucu, Nazli ^{[2
]}

机构：

[1] Antalya City Hosp, Phys Med & Rehabil, Antalya, Turkiye

[2] Antalya Ataturk State Hosp, Phys Med & Rehabil, Antalya, Turkiye

来源：

CUREUS JOURNAL OF MEDICAL SCIENCE | 2024年 / 16卷 / 07期

关键词：

googlebard; chatgpt; red flags; health information; artificial intelligence; low back pain; GUIDELINES; MANAGEMENT;

D O I：

10.7759/cureus.63580

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Background: Low back pain (LBP) is a prevalent healthcare concern that is frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) and GoogleBard's (Google, Mountain View, CA, USA) accuracy in responding to RF-related LBP questions and their capacity to discriminate the severity of the condition. Methods: We created 70 questions on RF-related symptoms and diseases following the LBP guidelines. Among them, 58 had a single symptom (SS), and 12 had multiple symptoms (MS) of LBP. Questions were posed to ChatGPT and GoogleBard, and responses were assessed by two authors for accuracy, completeness, and relevance (ACR) using a 5-point rubric criteria. Results: Cohen's kappa values (0.60-0.81) indicated significant agreement among the authors. The average scores for responses ranged from 3.47 to 3.85 for ChatGPT-3.5 and from 3.36 to 3.76 for GoogleBard for 58 SS questions, and from 4.04 to 4.29 for ChatGPT-3.5 and from 3.50 to 3.71 for GoogleBard for 12 MS questions. The ratings for these responses ranged from 'good' to 'excellent'. Most SS responses effectively conveyed the severity of the situation (93.1% for ChatGPT-3.5, 94.8% for GoogleBard), and all MS responses did so. No statistically significant differences were found between ChatGPT-3.5 and Google Bard scores (p>0.05). Conclusions: In an era characterized by widespread online health information seeking, artificial intelligence (AI) systems play a vital role in delivering precise medical information. These technologies may hold promise in the field of health information if they continue to improve.

引用

页数：13

共 3 条

[1] Evaluation of Advanced Artificial Intelligence Algorithms' Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models
Koyun, Mustafa
Taskent, Ismail
JOURNAL OF CLINICAL MEDICINE, 2025, 14 (02)
[2] Impact of artificial intelligence in managing musculoskeletal pathologies in physiatry: a qualitative observational study evaluating the potential use of ChatGPT versus Copilot for patient information and clinical advice on low back pain
Ah-Yan, Christophe
Boissonnault, Eve
Boudier-Reveret, Mathieu
Mares, Christopher
JOURNAL OF YEUNGNAM MEDICAL SCIENCE, 2025, 42
[3] An update on animal models of intervertebral disc degeneration and low back pain: Exploring the potential of artificial intelligence to improve research analysis and development of prospective therapeutics
Alini, Mauro
Diwan, Ashish D.
Erwin, W. Mark
Little, Chirstopher B.
Melrose, James
JOR SPINE, 2023, 6 (01):

← 1 →