Comparative Evaluation of Diagnostic Accuracy Between Google Bard and Physicians

被引:25
作者
Hirosawa, Takanobu [1 ,2 ]
Mizuta, Kazuya [1 ]
Harada, Yukinori [1 ]
Shimizu, Taro [1 ]
机构
[1] Dokkyo Med Univ, Dept Diagnost & Generalist Med, Mibu, Tochigi, Japan
[2] Dokkyo Med Univ, Dept Diagnost & Generalist Med, 880 Kitakobayashi, Mibu, Tochigi 3210293, Japan
关键词
Clinical decision supporting system; Diagnosis; Diagnostic excellence; Generative artificial intelligence; Large language model; Natural language processing; ARTIFICIAL-INTELLIGENCE;
D O I
10.1016/j.amjmed.2023.08.003
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
BACKGROUND: In this study, we evaluated the diagnostic accuracy of Google Bard, a generative artificial intelligence (AI) platform. METHODS: We searched published case reports from our department for difficult or uncommon case descriptions and mock cases created by physicians for common case descriptions. We entered the case descriptions into the prompt of Google Bard to generate the top 10 differential-diagnosis lists. As in previous studies, other physicians created differential-diagnosis lists by reading the same clinical descriptions.RESULTS: A total of 82 clinical descriptions (52 case reports and 30 mock cases) were used. The accuracy rates of physicians were still higher than Google Bard in the top 10 (56.1% vs 82.9%, P < .001), the top 5 (53.7% vs 78.0%, P = .002), and the top differential diagnosis (40.2% vs 64.6%, P = .003). Even within the specific context of case reports, physicians consistently outperformed Google Bard. When it came to mock cases, the performances of the differential-diagnosis lists by Google Bard were no different from those of the physicians in the top 10 (80.0% vs 96.6%, P = .11) and the top 5 (76.7% vs 96.6%, P = .06), except for those in the top diagnoses (60.0% vs 90.0%, P = .02).CONCLUSION: While physicians excelled overall, and particularly with case reports, Google Bard dis -played comparable diagnostic performance in common cases. This suggested that Google Bard possesses room for further improvement and refinement in its diagnostic capabilities. Generative AIs, including Google Bard, are anticipated to become increasingly beneficial in augmenting diagnostic accuracy.
引用
收藏
页码:1119 / +
页数:23
相关论文
共 11 条
[1]  
2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]
[2]   Artificial Intelligence and Machine Learning in Clinical Medicine, 2023 [J].
Haug, Charlotte J. J. ;
Drazen, Jeffrey M. M. .
NEW ENGLAND JOURNAL OF MEDICINE, 2023, 388 (13) :1201-1208
[3]  
Hirosawa T KR, 2023, PREPRINT
[4]  
Hirosawa Takanobu, 2023, Int J Environ Res Public Health, V20, DOI 10.3390/ijerph20043378
[5]   Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge [J].
Kanjee, Zahir ;
Crowe, Byron ;
Rodman, Adam .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2023, 330 (01) :78-80
[6]   Barriers and Facilitators to the Use of Clinical Decision Support Systems in Primary Care: A Mixed-Methods Systematic Review [J].
Meunier, Pierre -Yves ;
Raynaud, Camille ;
Guimaraes, Emmanuelle ;
Gueyffier, Francois ;
Letrilliart, Laurent .
ANNALS OF FAMILY MEDICINE, 2023, 21 (01) :57-69
[7]   The Effectiveness of Electronic Differential Diagnoses (DDX) Generators: A Systematic Review and Meta-Analysis [J].
Riches, Nicholas ;
Panagioti, Maria ;
Alam, Rahul ;
Cheraghi-Sohi, Sudeh ;
Campbell, Stephen ;
Esmail, Aneez ;
Bower, Peter .
PLOS ONE, 2016, 11 (03)
[8]   Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation [J].
Schmieding, Malte L. ;
Kopka, Marvin ;
Schmidt, Konrad ;
Schulz-Niethammer, Sven ;
Balzer, Felix ;
Feufel, Markus A. .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (05)
[9]  
Siad S. M., 2023, The promise and perils of Google's Bard for scientific research, P1, DOI [10.17613/yb4n-mc79, DOI 10.17613/YB4N-MC79]
[10]   Five strategies for clinicians to advance diagnostic excellence [J].
Singh, Hardeep ;
Connor, Denise M. ;
Dhaliwal, Gurpreet .
BMJ-BRITISH MEDICAL JOURNAL, 2022, 376