Performance of AI chatbots on controversial topics in oral medicine, pathology, and radiology

被引:5
作者
Mohammad-Rahimi, Hossein [1 ,2 ]
Khoury, Zaid H. [3 ]
Alamdari, Mina Iranparvar [4 ]
Rokhshad, Rata [2 ]
Motie, Parisa [5 ]
Parsa, Azin [6 ]
Tavares, Tiffany [7 ]
Sciubba, James J. [8 ]
Price, Jeffery B. [1 ,6 ]
Sultan, Ahmed S. [1 ,6 ,9 ]
机构
[1] Univ Maryland, Sch Dent, Div Artificial Intelligence Res, Baltimore, MD 21201 USA
[2] ITU WHO Focus Grp AI Hlth, Top Grp Dent Diagnost & Digital Dent, Berlin, Germany
[3] Meharry Med Coll, Sch Dent, Dept Oral Diagnost Sci & Res, Nashville, TN USA
[4] Shahid Beheshti Univ Med Sci, Sch Dent, Dept Oral & Maxillofacial Radiol, Tehran, Iran
[5] Med Univ Isfahan, Med Image & Signal Proc Res Ctr, Esfahan, Iran
[6] Univ Maryland, Sch Dent, Dept Oncol & Diagnost Sci, Baltimore, MD 21201 USA
[7] UT Hlth San Antonio Sch Dent, Dept Comprehens Dent, San Antonio, TX USA
[8] Johns Hopkins Univ, Dept Otolaryngol Head & Neck Surg, Baltimore, MD USA
[9] Univ Maryland, Marlene & Stewart Greenebaum Comprehens Canc Ctr, Baltimore, MD 21201 USA
来源
ORAL SURGERY ORAL MEDICINE ORAL PATHOLOGY ORAL RADIOLOGY | 2024年 / 137卷 / 05期
关键词
D O I
10.1016/j.oooo.2024.01.015
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
Objectives. In this study, we assessed 6 different artificial intelligence (AI) chatbots (Bing, GPT-3.5, GPT-4, Google Bard, Claude, Sage) responses to controversial and difficult questions in oral pathology, oral medicine, and oral radiology. Study Design. The chatbots' answers were evaluated by board-certified specialists using a modified version of the global quality score on a 5-point Likert scale. The quality and validity of chatbot citations were evaluated. Results. Claude had the highest mean score of 4.341 + 0.582 for oral pathology and medicine. Bing had the lowest scores of 3.447 + 0.566. In oral radiology, GPT-4 had the highest mean score of 3.621 + 1.009 and Bing the lowest score of 2.379 + 0.978. GPT-4 achieved the highest mean score of 4.066 + 0.825 for performance across all disciplines. 82 out of 349 (23.50%) of generated citations from chatbots were fake. Conclusions. The most superior chatbot in providing high-quality information for controversial topics in various dental disciplines was GPT-4. Although the majority of chatbots performed well, it is suggested that developers of AI medical chatbots incorporate scientific citation authenticators to validate the outputted citations given the relatively high number of fabricated citations. (Oral Surg Oral Med Oral Pathol Oral Radiol 2024;137:508-514)
引用
收藏
页码:508 / 514
页数:7
相关论文
共 27 条
  • [1] Differential diagnosis in the future
    Alawi, Faizan
    [J]. ORAL SURGERY ORAL MEDICINE ORAL PATHOLOGY ORAL RADIOLOGY, 2023, 136 (02): : 119 - 121
  • [2] Alhaidry Hind M, 2023, Cureus, V15, pe38317, DOI 10.7759/cureus.38317
  • [3] A systematic review of patient inflammatory bowel disease information resources on the world wide web
    Bernard, Andre
    Langille, Morgan
    Hughes, Stephanie
    Rose, Caren
    Leddin, Desmond
    van Zanten, Sander Veldhuyzen
    [J]. AMERICAN JOURNAL OF GASTROENTEROLOGY, 2007, 102 (09) : 2070 - 2077
  • [4] Brown TB, 2020, ADV NEUR IN, V33
  • [5] Performance of Generative Large Language Models on Ophthalmology Board-Style Questions
    Cai, Louis Z.
    Shaheen, Abdulla
    Jin, Andrew
    Fukui, Riya
    Yi, Jonathan S.
    Yannuzzi, Nicolas
    Alabiad, Chrisfouad
    [J]. AMERICAN JOURNAL OF OPHTHALMOLOGY, 2023, 254 : 141 - 149
  • [6] Ethical, legal, and social considerations of AI-based medical decision-support tools: A scoping review
    Cartolovni, Anto
    Tomicic, Ana
    Mosler, Elvira Lazic
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2022, 161
  • [7] Evidence synthesis, digital scribes, and translational challenges for artificial intelligence in healthcare
    Coiera, Enrico
    Liu, Sidong
    [J]. CELL REPORTS MEDICINE, 2022, 3 (12)
  • [8] The potential of ChatGPT in oral medicine: a new era of patient care?
    de Souza, Lucas Lacerda
    Lopes, Marcio Ajudarte
    Santos-Silva, Alan Roger
    Vargas, Pablo Agustin
    [J]. ORAL SURGERY ORAL MEDICINE ORAL PATHOLOGY ORAL RADIOLOGY, 2024, 137 (01): : 1 - 2
  • [9] Eriksen AV, 2024, NEJM AI, V1
  • [10] Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers
    Gao, Catherine A.
    Howard, Frederick M.
    Markov, Nikolay S.
    Dyer, Emma C.
    Ramesh, Siddhi
    Luo, Yuan
    Pearson, Alexander T.
    [J]. NPJ DIGITAL MEDICINE, 2023, 6 (01)