Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0

被引：1

作者：

Choi, Jisun ^{[1
]}

Oh, Ah Ran ^{[1
]}

Park, Jungchan ^{[1
]}

Kang, Ryung A. ^{[1
]}

Yoo, Seung Yeon ^{[1
]}

Lee, Dong Jae ^{[1
]}

Yang, Kwangmo ^{[2
]}

机构：

[1] Sungkyunkwan Univ, Sch Med, Samsung Med Ctr, Dept Anesthesiol & Pain Med, Seoul, South Korea

[2] Sungkyunkwan Univ, Ctr Hlth Promot, Samsung Med Ctr, Sch Med, Seoul, South Korea

来源：

FRONTIERS IN MEDICINE | 2024年 / 11卷

关键词：

ChatGPT; artificial intelligence; quality; quantity; AI chatbot;

D O I：

10.3389/fmed.2024.1400153

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Introduction The large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.Methods Two anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.Results Regarding quality, "appropriate" was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed "insufficient" in 59% of cases for 3.5, and "adequate" in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were - 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.Conclusion ChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.

引用

页数：9

共 27 条

[1] Brown TB, 2020, ADV NEUR IN, V33
[2] Cardiac Complications in Patients Undergoing Major Noncardiac Surgery
Devereaux, P. J.
Sessler, Daniel I.
[J]. NEW ENGLAND JOURNAL OF MEDICINE, 2015, 373 (23) : 2258 - 2269
[3] Eysenbach Gunther, 2023, JMIR Med Educ, V9, pe46885, DOI 10.2196/46885
[4] The scholarly footprint of ChatGPT: a bibliometric analysis of the early outbreak phase
Farhat, Faiza
Silva, Emmanuel Sirimal
Hassani, Hossein
Madsen, Dag oivind
Sohail, Shahab Saquib
Himeur, Yassine
Alam, M. Afshar
Zafar, Aasim
[J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 6
[5] Garg RK, 2023, HEALTH PROMOT PERSPE, V13, P183, DOI 10.34172/hpp.2023.22
[6] Exploring the Potential of Artificial Intelligence in Surgery: Insights from a Conversation with ChatGPT
Hassan, Abbas M.
Nelson, Jonas A.
Coert, J. Henk
Mehrara, Babak J.
Selber, Jesse C.
[J]. ANNALS OF SURGICAL ONCOLOGY, 2023, 30 (07) : 3875 - 3878
[7] Hisan UK., 2023, Journal of Pedagogy and Education Science, V2, P71, DOI DOI 10.56741/JPES.V2I01.302
[8] ChatGPT and antimicrobial advice: the end of the consulting infection doctor?
Howard, Alex
Hope, William
Gerada, Alessandro
[J]. LANCET INFECTIOUS DISEASES, 2023, 23 (04) : 405 - 406
[9] The use of ChatGPT and other large language models in surgical science
Janssen, Boris, V
Kazemier, Geert
Besselink, Marc G.
[J]. BJS OPEN, 2023, 7 (02):
[10] Johnson Douglas, 2023, Res Sq, DOI 10.21203/rs.3.rs-2566942/v1

← 1 2 3 →