Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0

被引:1
作者
Choi, Jisun [1 ]
Oh, Ah Ran [1 ]
Park, Jungchan [1 ]
Kang, Ryung A. [1 ]
Yoo, Seung Yeon [1 ]
Lee, Dong Jae [1 ]
Yang, Kwangmo [2 ]
机构
[1] Sungkyunkwan Univ, Sch Med, Samsung Med Ctr, Dept Anesthesiol & Pain Med, Seoul, South Korea
[2] Sungkyunkwan Univ, Ctr Hlth Promot, Samsung Med Ctr, Sch Med, Seoul, South Korea
关键词
ChatGPT; artificial intelligence; quality; quantity; AI chatbot;
D O I
10.3389/fmed.2024.1400153
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Introduction The large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.Methods Two anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.Results Regarding quality, "appropriate" was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed "insufficient" in 59% of cases for 3.5, and "adequate" in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were - 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.Conclusion ChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.
引用
收藏
页数:9
相关论文
共 27 条
  • [1] Brown TB, 2020, ADV NEUR IN, V33
  • [2] Cardiac Complications in Patients Undergoing Major Noncardiac Surgery
    Devereaux, P. J.
    Sessler, Daniel I.
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2015, 373 (23) : 2258 - 2269
  • [3] Eysenbach Gunther, 2023, JMIR Med Educ, V9, pe46885, DOI 10.2196/46885
  • [4] The scholarly footprint of ChatGPT: a bibliometric analysis of the early outbreak phase
    Farhat, Faiza
    Silva, Emmanuel Sirimal
    Hassani, Hossein
    Madsen, Dag oivind
    Sohail, Shahab Saquib
    Himeur, Yassine
    Alam, M. Afshar
    Zafar, Aasim
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 6
  • [5] Garg RK, 2023, HEALTH PROMOT PERSPE, V13, P183, DOI 10.34172/hpp.2023.22
  • [6] Exploring the Potential of Artificial Intelligence in Surgery: Insights from a Conversation with ChatGPT
    Hassan, Abbas M.
    Nelson, Jonas A.
    Coert, J. Henk
    Mehrara, Babak J.
    Selber, Jesse C.
    [J]. ANNALS OF SURGICAL ONCOLOGY, 2023, 30 (07) : 3875 - 3878
  • [7] Hisan UK., 2023, Journal of Pedagogy and Education Science, V2, P71, DOI DOI 10.56741/JPES.V2I01.302
  • [8] ChatGPT and antimicrobial advice: the end of the consulting infection doctor?
    Howard, Alex
    Hope, William
    Gerada, Alessandro
    [J]. LANCET INFECTIOUS DISEASES, 2023, 23 (04) : 405 - 406
  • [9] The use of ChatGPT and other large language models in surgical science
    Janssen, Boris, V
    Kazemier, Geert
    Besselink, Marc G.
    [J]. BJS OPEN, 2023, 7 (02):
  • [10] Johnson Douglas, 2023, Res Sq, DOI 10.21203/rs.3.rs-2566942/v1