Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

被引：173

作者：

Walker, Harriet Louise ^{[1
]}

Ghani, Shahi ^{[1
]}

Kuemmerli, Christoph ^{[2
]}

Nebiker, Christian Andreas ^{[3
]}

Muller, Beat Peter ^{[2
]}

Raptis, Dimitri Aristotle ^{[4
]}

Staubli, Sebastian Manuel ^{[1
,2
]}

机构：

[1] Royal Free London NHS Fdn Trust, Pond St, London NW3 2QG, England

[2] Clarunis Univ Ctr Gastrointestinal & Liver Dis, Basel, Switzerland

[3] Kantonsspital Aarau, Dept Chirurg, Aarau, Switzerland

[4] King Faisal Specialist Hosp & Res Ctr, Organ Transplant Ctr Excellence, Riyadh, Saudi Arabia

来源：

JOURNAL OF MEDICAL INTERNET RESEARCH | 2023年 / 25卷

关键词：

artificial intelligence; internet information; patient information; ChatGPT; EQIP tool; chatbot; chatbots; conversational agent; conversational agents; internal medicine; pancreas; liver; hepatic; biliary; gall; bile; gallstone; pancreatitis; pancreatic; medical information;

D O I：

10.2196/47479

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background: ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI. Objective: We aimed to assess the reliability of medical information provided by ChatGPT. Methods: Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT. Results: Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss kappa was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%. Conclusions: ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information.

引用

页数：9

共 32 条

[1] Artificial Hallucinations in ChatGPT: Implications in Scientific Writing [J].

Alkaissi, Hussam ;

McFarlane, Samy I. .

CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (02)

[2]

[Anonymous], PANCR CANC AD DIAGN

[3]

[Anonymous], CIRRH 16S ASS MAN

[4]

[Anonymous], DIG DIS LEV 2 CAUS

[5]

[Anonymous], PANCR NICE GUID NG10

[6]

[Anonymous], About us

[7]

[Anonymous], TOT CANC CAUS

[8]

[Anonymous], CLIN GUID GALLST DIS

[9] ChatGPT: five priorities for research [J].

Bockting, Claudi ;

van Dis, Eva A. M. ;

Bollen, Johan ;

van Rooij, Robert ;

Zuidema, Willem L. .

NATURE, 2023, 614 (7947) :224-226

[10] Measuring quality of patient information documents with an expanded EQIP scale [J].

Charvet-Berard, A. I. ;

Chopard, P. ;

Perneger, T. V. .

PATIENT EDUCATION AND COUNSELING, 2008, 70 (03) :407-411

← 1 2 3 4 →