Is artificial intelligence ready to replace specialist doctors entirely? ENT specialists vs ChatGPT: 1-0, ball at the center

被引:19
作者
Dallari, Virginia [1 ,2 ]
Sacchetto, Andrea [1 ,3 ]
Saetti, Roberto [3 ]
Calabrese, Luca [4 ]
Vittadello, Fabio [5 ]
Gazzini, Luca [1 ,4 ]
机构
[1] Y CEORL HNS, Young Confederat European ORL HNS, Dublin, Ireland
[2] Univ Verona, Head & Neck Dept, Unit Otorhinolaryngol, Piazzale LA Scuro 10, I-37134 Verona, Italy
[3] AULSS 8 Berica, Osped San Bortolo, Dept Dermatol, Vicenza, Italy
[4] Paracelsus Med Univ PMU, Hosp Bolzano SABES ASDAA, Dept Otorhinolaryngol Head & Neck Surg, Teaching Hosp, Bolzano, Italy
[5] Explora Res & Stat Anal, Padua, Italy
关键词
Machine learning; ChatGPT; Otolaryngology; Natural language processing; Research;
D O I
10.1007/s00405-023-08321-1
中图分类号
R76 [耳鼻咽喉科学];
学科分类号
100213 ;
摘要
Purpose The purpose of this study is to evaluate ChatGPT's responses to Ear, Nose and Throat (ENT) clinical cases and compare them with the responses of ENT specialists. Methods We have hypothesized 10 scenarios, based on ENT daily experience, with the same primary symptom. We have constructed 20 clinical cases, 2 for each scenario. We described them to 3 ENT specialists and ChatGPT. The difficulty of the clinical cases was assessed by the 5 ENT authors of this article. The responses of ChatGPT were evaluated by the 5 ENT authors of this article for correctness and consistency with the responses of the 3 ENT experts. To verify the stability of ChatGPT's responses, we conducted the searches, always from the same account, for 5 consecutive days. Methods We have hypothesized 10 scenarios, based on ENT daily experience, with the same primary symptom. We have constructed 20 clinical cases, 2 for each scenario. We described them to 3 ENT specialists and ChatGPT.The difficulty of the clinical cases was assessed by the 5 ENT authors of this article. The responses of ChatGPT were evaluated by the 5 ENT authors of this article for correctness and consistency with the responses of the 3 ENT experts. To verify the stability of ChatGPT's responses, we conducted the searches, always from the same account, for 5 consecutive days. Results Among the 20 cases, 8 were rated as low complexity, 6 as moderate complexity and 6 as high complexity. The overall mean correctness and consistency score of ChatGPT responses was 3.80 (SD 1.02) and 2.89 (SD 1.24), respectively. We did not find a statistically significant difference in the average ChatGPT correctness and coherence score according to case complexity. The total intraclass correlation coefficient (ICC) for the stability of the correctness and consistency of ChatGPT was 0.763 (95% confidence interval [CI] 0.553-0.895) and 0.837 (95% CI 0.689-0.927), respectively. Conclusions Our results revealed the potential usefulness of ChatGPT in ENT diagnosis. The instability in responses and the inability to recognise certain clinical elements are its main limitations.
引用
收藏
页码:995 / 1023
页数:29
相关论文
共 12 条
[1]   Head-to-Head Comparison of ChatGPT Versus Google Search for Medical Knowledge Acquisition [J].
Ayoub, Noel F. ;
Lee, Yu-Jin ;
Grimm, David ;
Divi, Vasu .
OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2024, 170 (06) :1484-1491
[2]   ChatGPT and the Future of Medical Writing [J].
Biswas, Som .
RADIOLOGY, 2023, 307 (02)
[3]   "Vertigo, likely peripheral": the dizzying rise of ChatGPT [J].
Chee, Jeremy ;
Kwa, Eunice Dawn ;
Goh, Xueying .
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2023, 280 (10) :4687-4689
[4]   Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support [J].
Chiesa-Estomba, Carlos M. ;
Lechien, Jerome R. ;
Vaira, Luigi A. ;
Brunet, Aina ;
Cammaroto, Giovanni ;
Mayo-Yanez, Miguel ;
Sanchez-Barrueco, Alvaro ;
Saga-Gutierrez, Carlos .
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2023, 281 (4) :2081-2086
[5]   ABSTRACTS WRITTEN BY CHATGPT FOOL SCIENTISTS [J].
Else, Holly .
NATURE, 2023, 613 (7944) :423-423
[6]   ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions [J].
Hoch, Cosima C. ;
Wollenberg, Barbara ;
Lueers, Jan-Christoffer ;
Knoedler, Samuel ;
Knoedler, Leonard ;
Frank, Konstantin ;
Cotofana, Sebastian ;
Alfertshofer, Michael .
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2023, 280 (09) :4271-4278
[7]   ChatGPT and antimicrobial advice: the end of the consulting infection doctor? [J].
Howard, Alex ;
Hope, William ;
Gerada, Alessandro .
LANCET INFECTIOUS DISEASES, 2023, 23 (04) :405-406
[8]   A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research [J].
Koo, Terry K. ;
Li, Mae Y. .
JOURNAL OF CHIROPRACTIC MEDICINE, 2016, 15 (02) :155-163
[9]   Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI) [J].
Lechien, Jerome R. ;
Maniaci, Antonino ;
Gengler, Isabelle ;
Hans, Stephane ;
Chiesa-Estomba, Carlos M. ;
Vaira, Luigi A. .
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (04) :2063-2079
[10]   Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department [J].
Nielsen, Jacob P. S. ;
von Buchwald, Christian ;
Gronhoj, Christian .
ACTA OTO-LARYNGOLOGICA, 2023, 143 (09) :779-782