Accuracy, readability, and understandability of large language models for prostate cancer information to the public

被引:16
作者
Hershenhouse, Jacob S. [1 ,2 ,3 ]
Mokhtar, Daniel [1 ,2 ,3 ]
Eppler, Michael B. [1 ,2 ,3 ]
Rodler, Severin [1 ,2 ,3 ]
Ramacciotti, Lorenzo Storino [1 ,2 ,3 ]
Ganjavi, Conner [1 ,2 ,3 ]
Hom, Brian [1 ,2 ,3 ]
Davis, Ryan J. [1 ,2 ,3 ]
Tran, John [1 ,2 ,3 ]
Russo, Giorgio Ivan [4 ]
Cocci, Andrea [5 ]
Abreu, Andre [1 ,2 ,3 ]
Gill, Inderbir [1 ,2 ,3 ]
Desai, Mihir [1 ,2 ]
Cacciamani, Giovanni E. [1 ,2 ,3 ]
机构
[1] Univ Southern Calif, USC Inst Urol, Keck Sch Med, Los Angeles, CA 90007 USA
[2] Univ Southern Calif, Keck Sch Med, Catherine & Joseph Aresty Dept Urol, Los Angeles, CA 90007 USA
[3] Univ Southern Calif, USC Inst Urol, Ctr Artificial Intelligence, Los Angeles, CA 90007 USA
[4] Univ Catania, Urol Sect, Catania, Italy
[5] Univ Florence, Urol Sect, Florence, Italy
关键词
D O I
10.1038/s41391-024-00826-y
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
BACKGROUND: Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption. METHODS: Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question. RESULTS: GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%). CONCLUSION: GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.
引用
收藏
页码:394 / 399
页数:6
相关论文
共 50 条
[1]   Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT [J].
Abi-Rafeh, Jad ;
Xu, Hong Hao ;
Kazan, Roy ;
Tevlin, Ruth ;
Furnas, Heather .
AESTHETIC SURGERY JOURNAL, 2024, 44 (03) :329-343
[2]   The promising role of new molecular biomarkers in prostate cancer: from coding and non-coding genes to artificial intelligence approaches [J].
Alarcon-Zendejas, Ana Paula ;
Scavuzzo, Anna ;
Jimenez-Rios, Miguel A. ;
Alvarez-Gomez, Rosa M. ;
Montiel-Manriquez, Rogelio ;
Castro-Hernandez, Clementina ;
Jimenez-Davila, Miguel A. ;
Perez-Montiel, Delia ;
Gonzalez-Barrios, Rodrigo ;
Jimenez-Trejo, Francisco ;
Arriaga-Canon, Cristian ;
Herrera, Luis A. .
PROSTATE CANCER AND PROSTATIC DISEASES, 2022, 25 (03) :431-443
[3]   European citizens' use of E-health services: A study of seven countries [J].
Andreassen, Hege K. ;
Bujnowska-Fedak, Maria M. ;
Chronaki, Catherine E. ;
Dumitru, Roxana C. ;
Pudule, Iveta ;
Santana, Silvina ;
Voss, Henning ;
Wynn, Rolf .
BMC PUBLIC HEALTH, 2007, 7 (1)
[4]  
[Anonymous], 2021, Good Lay Summary Practice
[5]   Artificial intelligence applications in prostate cancer [J].
Baydoun, Atallah ;
Jia, Angela Y. Y. ;
Zaorsky, Nicholas G. G. ;
Kashani, Rojano ;
Rao, Santosh ;
Shoag, Jonathan E. E. ;
Vince, Randy A. A. ;
Bittencourt, Leonardo Kayat ;
Zuhour, Raed ;
Price, Alex T. T. ;
Arsenault, Theodore H. H. ;
Spratt, Daniel E. E. .
PROSTATE CANCER AND PROSTATIC DISEASES, 2024, 27 (01) :37-45
[6]   Aging in an Era of Fake News [J].
Brashier, Nadia M. ;
Schacter, Daniel L. .
CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE, 2020, 29 (03) :316-323
[7]   Who searches the internet for health information? [J].
Bundorf, MK ;
Wagner, TH ;
Singer, SJ ;
Baker, LC .
HEALTH SERVICES RESEARCH, 2006, 41 (03) :819-836
[8]   Generative Artificial Intelligence in Health Care [J].
Cacciamani, Giovanni E. ;
Siemens, D. Robert ;
Gill, Inderbir .
JOURNAL OF UROLOGY, 2023, 210 (05) :723-725
[9]   Asking "Dr. Google" for a Second Opinion: The Devil Is in the Details [J].
Cacciamani, Giovanni E. ;
Dell'Oglio, Paolo ;
Cocci, Andrea ;
Russo, Giorgio I. ;
Abreu, Andre De Castro ;
Gill, Inderbir S. ;
Briganti, Alberto ;
Artibani, Walter .
EUROPEAN UROLOGY FOCUS, 2021, 7 (02) :479-481
[10]  
Cacciamani GE, 2020, LANCET ONCOL, V21, P494, DOI 10.1016/S1470-2045(20)30138-8