Assessing GPT-4's accuracy in answering clinical pharmacological questions on pain therapy

被引：1

作者：

Stroop, Anna ^{[1
]}

Stroop, Tabea ^{[2
]}

Alsofy, Samer Zawy ^{[1
,3
]}

Wegner, Moritz ^{[4
,5
]}

Nakamura, Makoto ^{[1
]}

Stroop, Ralf ^{[1
,6
]}

机构：

[1] Witten Herdecke Univ, Fac Hlth, Dept Med, Alfred Herrhausen Str 45, D-58455 Witten, Germany

[2] Philipps Univ Marburg, Marburg, Germany

[3] Univ Munster, St Barbara Hosp, Dept Neurosurg, Acad Hosp, Hamm, Germany

[4] Univ Cologne, Fac Med, Dept Vasc & Endovascular Surg, Cologne, Germany

[5] Univ Hosp Cologne, Cologne, Germany

[6] Med Sch Hamburg, Hamburg, Germany

来源：

BRITISH JOURNAL OF CLINICAL PHARMACOLOGY | 2025年 / 91卷 / 08期

关键词：

artificial intelligence; ChatGPT; drug interactions; GPT-4; large language models; pain management; pharmacology;

D O I：

10.1002/bcp.70036

中图分类号：

R9 [药学];

学科分类号：

1007 ;

摘要：

Aims: This study aimed to evaluate the accuracy and completeness of GPT-4, a large language model, in answering clinical pharmacological questions related to pain therapy, with a focus on its potential as a tool for delivering patient-facing medical information. The objective was to assess its reliability in delivering medical information in the context of pain management. Methods: A cross-sectional survey-based study was conducted with healthcare professionals, including physicians and pharmacists. Participants submitted up to 8 clinical pharmacology questions on pain management, focusing on drug interactions, dosages and contraindications. GPT-4's responses were evaluated based on comprehensibility, detail, satisfaction, medical-pharmacological accuracy and completeness. Additionally, responses were compared to the German Drug Directory to assess their accuracy. Results: The majority of participants (99%) found GPT-4's responses comprehensible, while 84% considered the information detailed enough. Overall satisfaction was high, with 93% expressing satisfaction, and 96% deemed the responses medically accurate. However, only 63% rated the information as complete, with some identifying gaps in pharmacokinetics and drug interaction data. Usability was evaluated as good to excellent, with a System Usability Scale score of 83.38 (+/- 10.26). Conclusion: GPT-4 demonstrates potential as a tool for delivering medical information, particularly in pain management. However, limitations such as incomplete pharmacological data and the potential for contextual carryover in follow-up questions suggest that further refinement is necessary. Developing specialized artificial intelligence tools that integrate real-time pharmacological databases could improve accuracy and reliability for clinical decision-making.

引用

页码：2294 / 2303

页数：10

共 26 条

[1] Proximal Aortic Landing Zone Dilation Following Thoracic Endovascular Aortic Repair for Type B Aortic Dissection: Incidence and Clinical Implications [J].

Ahmad, Wael ;

Wegner, Moritz ;

Aras, Tuna ;

Dorweiler, Bernhard .

ANNALS OF VASCULAR SURGERY, 2025, 114 :45-53

[2] A pseudo-customer cross-sectional study to evaluate the community pharmacist?s management of migraine in pregnant women [J].

Al Kubaisi, Khalid ;

Hasan, Sanah ;

Hassan, Nageeb AbdulGalil ;

elnour, Asim Ahmed .

PHARMACY PRACTICE-GRANADA, 2022, 20 (04)

[3]

Aljanabi M, 2023, Mesopotamian Journal of Cyber Security, P16, DOI [10.58496/mjcs/2023/003, 10.58496/MJCS/2023/004, DOI 10.58496/MJCS/2023/003, 10.58496/MJCS/2023/003]

[4] Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review [J].

Bedi, Suhana ;

Liu, Yutong ;

Orr-Ewing, Lucy ;

Dash, Dev ;

Koyejo, Sanmi ;

Callahan, Alison ;

Fries, Jason A. ;

Wornow, Michael ;

Swaminathan, Akshay ;

Lehmann, Lisa Soleymani ;

Hong, Hyo Jung ;

Kashyap, Mehr ;

Chaurasia, Akash R. ;

Shah, Nirav R. ;

Singh, Karandeep ;

Tazbaz, Troy ;

Milstein, Arnold ;

Pfeffer, Michael A. ;

Shah, Nigam H. .

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2025, 333 (04) :319-328

[5] Internet Use for Obtaining Medicine Information: Cross-sectional Survey [J].

Bergmo, Trine Strand ;

Sandsdalen, Vilde ;

Manskow, Unn Sollid ;

Smabrekke, Lars ;

Waaseth, Marit .

JMIR FORMATIVE RESEARCH, 2023, 7

[6] Comprehensibility of the package leaflets of all medicinal products for human use: A questionnaire survey about the use of symbols and pictograms [J].

Bernardini, C ;

Ambrogi, V ;

Perioli, L ;

Tiralti, MC ;

Fardella, G .

PHARMACOLOGICAL RESEARCH, 2000, 41 (06) :679-688

[7]

Brooke John., 1996, SUS QUICK DIRTY USAB

[8]

Bubeck S, 2023, arXiv

[9] Can Patients Trust Online Health Information? A Meta-narrative Systematic Review Addressing the Quality of Health Information on the Internet [J].

Daraz, Lubna ;

Morrow, Allison S. ;

Ponce, Oscar J. ;

Beuschel, Bradley ;

Farah, Magdoleen H. ;

Katabi, Abdulrahman ;

Alsawas, Mouaz ;

Majzoub, Abdul M. ;

Benkhadra, Raed ;

Seisa, Mohamed O. ;

Ding, Jingyi ;

Prokop, Larry ;

Murad, M. Hassan .

JOURNAL OF GENERAL INTERNAL MEDICINE, 2019, 34 (09) :1884-1891

[10] Improving the quality of web surveys: The checklist for reporting results of Internet e-surveys (CHERRIES) [J].

Eysenbach, G .

JOURNAL OF MEDICAL INTERNET RESEARCH, 2004, 6 (03) :12-16

← 1 2 3 →