Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks

被引:69
作者
Sandmann, Sarah [1 ]
Riepenhausen, Sarah [1 ]
Plagwitz, Lucas [1 ]
Varghese, Julian [1 ]
机构
[1] Univ Munster, Inst Med Informat, Munster, Germany
关键词
INFORMATION;
D O I
10.1038/s41467-024-46411-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
It is likely that individuals are turning to Large Language Models (LLMs) to seek health advice, much like searching for diagnoses on Google. We evaluate clinical accuracy of GPT-3 center dot 5 and GPT-4 for suggesting initial diagnosis, examination steps and treatment of 110 medical cases across diverse clinical disciplines. Moreover, two model configurations of the Llama 2 open source LLMs are assessed in a sub-study. For benchmarking the diagnostic task, we conduct a naive Google search for comparison. Overall, GPT-4 performed best with superior performances over GPT-3 center dot 5 considering diagnosis and examination and superior performance over Google for diagnosis. Except for treatment, better performance on frequent vs rare diseases is evident for all three approaches. The sub-study indicates slightly lower performances for Llama models. In conclusion, the commercial LLMs show growing potential for medical question answering in two successive major releases. However, some weaknesses underscore the need for robust and regulated AI models in health care. Open source LLMs can be a viable option to address specific needs regarding data privacy and transparency of training. People will likely use ChatGPT to seek health advice. Here, the authors show promising performance of ChatGPT and open source models, but a lack of high accuracy considering medical question answering. Improvements are expected over time via domain-specific finetuning and integration of regulations.
引用
收藏
页数:8
相关论文
共 27 条
[1]   Artificial Hallucinations in ChatGPT: Implications in Scientific Writing [J].
Alkaissi, Hussam ;
McFarlane, Samy I. .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (02)
[2]  
[Anonymous], 1936, TEORIA STAT CLASSI C
[3]   Artificial hallucination: GPT on LSD? [J].
Beutel, Gernot ;
Geerits, Eline ;
Kielstein, Jan T. .
CRITICAL CARE, 2023, 27 (01)
[4]   ChatGPT and the Future of Medical Writing [J].
Biswas, Som .
RADIOLOGY, 2023, 307 (02)
[5]   Dr Google in the ED: searching for online health information by adult emergency department patients [J].
Cocco, Anthony M. ;
Zordan, Rachel ;
Taylor, David McD ;
Weiland, Tracey J. ;
Dilley, Stuart J. ;
Kant, Joyce ;
Dombagolla, Mahesha ;
Hendarto, Andreas ;
Lai, Fiona ;
Hutton, Jennie .
MEDICAL JOURNAL OF AUSTRALIA, 2018, 209 (08) :342-347
[6]  
Deng J., 2023, FRONTIERS COMPUTING, V2, P81, DOI [DOI 10.54097/FCIS.V2I2.4465, 10.54097/fcis.v2i2.4465]
[7]   Implications of large language models such as ChatGPT for dental medicine [J].
Eggmann, Florin ;
Weiger, Roland ;
Zitzmann, Nicola U. ;
Blatz, Markus B. .
JOURNAL OF ESTHETIC AND RESTORATIVE DENTISTRY, 2023, 35 (07) :1098-1102
[8]   Can ChatGPT pass the life support exams without entering the American heart association course? [J].
Fijaoko, Nino ;
Gosak, Lucija ;
Stiglic, Gregor ;
Picard, Christopher T. ;
Douma, Matthew John .
RESUSCITATION, 2023, 185
[9]   Clinical research for rare disease: Opportunities, challenges, and solutions [J].
Griggs, Robert C. ;
Batshaw, Mark ;
Dunkle, Mary ;
Gopal-Srivastava, Rashmi ;
Kaye, Edward ;
Krischer, Jeffrey ;
Nguyen, Tan ;
Paulus, Kathleen ;
Merkel, Peter A. .
MOLECULAR GENETICS AND METABOLISM, 2009, 96 (01) :20-26
[10]  
Hirosawa Takanobu, 2023, Int J Environ Res Public Health, V20, DOI 10.3390/ijerph20043378