Chatbots vs andrologists: Testing 25 clinical cases

被引:4
作者
Perrot, Ophelie [1 ]
Schirmann, Aurelie [1 ]
Vidart, Adrien [1 ]
Guillot-Tantay, Cyrille [1 ]
Izard, Vincent [1 ]
Lebret, Thierry [1 ]
Boillot, Bernard [1 ]
Mesnard, Benoit [1 ]
Lebacle, Cedric [2 ]
Madec, Francois-Xavier [1 ]
机构
[1] Foch Hosp, Urol Dept, Suresnes, France
[2] Kremlin Bicetre Hosp, Urol Dept, Le Kremlin Bicetre, France
来源
FRENCH JOURNAL OF UROLOGY | 2024年 / 34卷 / 09期
关键词
Artificial intelligence; Andrology; Clinical reasoning; Natural language processing;
D O I
10.1016/j.fjurol.2024.102636
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Objective: AI-derived language models are booming, and their place in medicine is undefined. The aim of our study is to compare responses to andrology clinical cases, between chatbots and andrologists, to assess the reliability of these technologies. Material and method: We analyzed the responses of 32 experts, 18 residents and three chatbots (ChatGPT v3.5, v4 and Bard) to 25 andrology clinical cases. Responses were assessed on a Likert scale ranging from 0 to 2 for each question (0-false response or no response; 1-partially correct response, 2- correct response), on the basis of the latest national or, in the absence of such, international recommendations. We compared the averages obtained for all cases by the different groups. Results: Experts obtained a higher mean score (m = 11/12.4 sigma = 1.4) than ChatGPT v4 (m = 10.7/12.4 sigma = 2.2, p = 0.6475), ChatGPT v3.5 (m = 9.5/12.4 sigma = 2.1, p = 0.0062) and Bard (m = 7.2/12.4 sigma = 3.3, p < 0.0001). Residents obtained a mean score (m = 9.4/12.4 sigma = 1.7) higher than Bard (m = 7.2/12.4 sigma = 3.3, p = 0.0053) but lower than ChatGPT v3.5 (m = 9.5/12.4 sigma = 2.1, p = 0.8393) and v4 (m = 10.7/12.4 sigma = 2.2, p = 0.0183) and experts (m = 11.0/12.4 sigma = 1.4,p = 0.0009). ChatGPT v4 performance (m = 10.7 sigma = 2.2) was better than ChatGPT v3.5 (m = 9.5, sigma = 2.1, p = 0.0476) and Bard performance (m = 7.2 sigma = 3.3, p < 0.0001). Conclusion: The use of chatbots in medicine could be relevant. More studies are needed to integrate them into clinical practice. Level of evidence: 4. (c) 2024 Elsevier Masson SAS. All rights reserved.
引用
收藏
页数:7
相关论文
共 32 条
[1]  
[Anonymous], 2023, ChatGPT Statistics 2023 Revealed: Insights & Trends
[2]   Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions [J].
Bernstein, Isaac A. ;
Zhang, Youchen ;
Govil, Devendra ;
Majid, Iyad ;
Chang, Robert T. ;
Sun, Yang ;
Shue, Ann ;
Chou, Jonathan C. ;
Schehlein, Emily ;
Christopher, Karen L. ;
Groth, Sylvia L. ;
Ludwig, Cassie ;
Wang, Sophia Y. .
JAMA NETWORK OPEN, 2023, 6 (08)
[3]   Evaluating the performance of ChatGPT in answering questions related to pediatric urology [J].
Caglar, Ufuk ;
Yildiz, Oguzhan ;
Meric, Arda ;
Ayranci, Ali ;
Gelmis, Mucahit ;
Sarilar, Omer ;
Ozgor, Faruk .
JOURNAL OF PEDIATRIC UROLOGY, 2024, 20 (01) :26.e1-26.e5
[4]   Evaluating the performance of ChatGPT in answering questions related to urolithiasis [J].
Cakir, Hakan ;
Caglar, Ufuk ;
Yildiz, Oguzhan ;
Meric, Arda ;
Ayranci, Ali ;
Ozgor, Faruk .
INTERNATIONAL UROLOGY AND NEPHROLOGY, 2024, 56 (01) :17-21
[5]  
Cocci A, 2024, PROSTATE CANCER P D, V27, P103, DOI 10.1038/s41391-023-00705-y
[6]   Can ChatGPT, an Artificial Intelligence Language Model, Provide Accurate and High-quality Patient Information on Prostate Cancer? [J].
Coskun, Burhan ;
Ocakoglu, Gokhan ;
Yetemen, Melih ;
Kaygisiz, Onur .
UROLOGY, 2023, 180 :35-58
[7]   Evaluating the Effectiveness of Artificial Intelligence-powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology [J].
Davis, Ryan ;
Eppler, Michael ;
Ayo-Ajibola, Oluwatobiloba ;
Loh-Doyle, Jeffrey C. ;
Nabhani, Jamal ;
Samplaski, Mary ;
Gill, Inderbir ;
Cacciamani, Giovanni E. .
JOURNAL OF UROLOGY, 2023, 210 (04) :688-694
[8]   ChatGPT Performance on the American Urological Association Self-assessment Study Program and the Potential Influence of Artificial Intelligence in Urologic Training [J].
Deebel, Nicholas A. ;
Terlecki, Ryan .
UROLOGY, 2023, 177 :29-33
[9]   Machine Learning in Medicine [J].
Deo, Rahul C. .
CIRCULATION, 2015, 132 (20) :1920-1930
[10]   The utility of the ChatGPT artificial intelligence tool for patient education and enquiry in robotic radical prostatectomy [J].
Gabriel, Joseph ;
Shafik, Lidia ;
Alanbuki, Ammar ;
Larner, Tim .
INTERNATIONAL UROLOGY AND NEPHROLOGY, 2023, 55 (11) :2717-2732