New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology

被引:42
|
作者
Huynh, Linda My [1 ]
Bonebrake, Benjamin T. [2 ]
Schultis, Kaitlyn [2 ]
Quach, Alan [3 ]
Deibert, Christopher M. [3 ,4 ]
机构
[1] Univ Nebraska Med Ctr, Omaha, NE USA
[2] Univ Nebraska Med Ctr, Coll Med, Omaha, NE USA
[3] Univ Nebraska Med Ctr, Div Urol, Omaha, NE USA
[4] Univ Nebraska Med Ctr, Dept Surg, Div Urol, 987521 Nebraska Med Ctr, Omaha, NE 68198 USA
关键词
artificial intelligence; medical informatics applications; urology;
D O I
10.1097/UPJ.0000000000000406
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Introduction:Large language models have demonstrated impressive capabilities, but application to medicine remains unclear. We seek to evaluate the use of ChatGPT on the American Urological Association Self-assessment Study Program as an educational adjunct for urology trainees and practicing physicians.Methods:One hundred fifty questions from the 2022 Self-assessment Study Program exam were screened, and those containing visual assets (n=15) were removed. The remaining items were encoded as open ended or multiple choice. ChatGPT's output was coded as correct, incorrect, or indeterminate; if indeterminate, responses were regenerated up to 2 times. Concordance, quality, and accuracy were ascertained by 3 independent researchers and reviewed by 2 physician adjudicators. A new session was started for each entry to avoid crossover learning.Results:ChatGPT was correct on 36/135 (26.7%) open-ended and 38/135 (28.2%) multiple-choice questions. Indeterminate responses were generated in 40 (29.6%) and 4 (3.0%), respectively. Of the correct responses, 24/36 (66.7%) and 36/38 (94.7%) were on initial output, 8 (22.2%) and 1 (2.6%) on second output, and 4 (11.1%) and 1 (2.6%) on final output, respectively. Although regeneration decreased indeterminate responses, proportion of correct responses did not increase. For open-ended and multiple-choice questions, ChatGPT provided consistent justifications for incorrect answers and remained concordant between correct and incorrect answers.Conclusions:ChatGPT previously demonstrated promise on medical licensing exams; however, application to the 2022 Self-assessment Study Program was not demonstrated. Performance improved with multiple-choice over open-ended questions. More importantly were the persistent justifications for incorrect responses-left unchecked, utilization of ChatGPT in medicine may facilitate medical misinformation.
引用
收藏
页码:408 / +
页数:8
相关论文
共 6 条
  • [1] Google Bard Artificial Intelligence vs the 2022 Self-Assessment Study Program for Urology
    Huynh, Linda My
    Bonebrake, Benjamin T.
    Schultis, Kaitlyn
    Quach, Alan
    Deibert, Christopher M.
    UROLOGY PRACTICE, 2023, 10 (06)
  • [2] Artificial Intelligence on the Exam Table: ChatGPT's Advancement in Urology Self-assessment
    Cadiente, Angelo
    Chen, Jamie
    Nguyen, Jennifer
    Sadeghi-Nejad, Hossein
    Billah, Mubashir
    UROLOGY PRACTICE, 2023, 10 (06) : 521 - 523
  • [3] <bold>Self-assessment of university students on the application and potential of Artificial Intelligence for their formation</bold>
    Aguilar, Nivia T. Alvarez
    Cubero, Arnulfo Trevino
    Elizondo, Jaime Arturo Castillo
    ATENAS, 2024, (62):
  • [4] The performance of artificial intelligence language models in board-style dental knowledge assessment A preliminary study on ChatGPT
    Danesh, Arman
    Pazouki, Hirad
    Danesh, Kasra
    Danesh, Farzad
    Danesh, Arsalan
    JOURNAL OF THE AMERICAN DENTAL ASSOCIATION, 2023, 154 (11) : 970 - 974
  • [5] Impact of a Commercial Artificial Intelligence-Driven Patient Self-Assessment Solution on Waiting Times at General Internal Medicine Outpatient Departments: Retrospective Study
    Harada, Yukinori
    Shimizu, Taro
    JMIR MEDICAL INFORMATICS, 2020, 8 (08)
  • [6] Performance assessment of artificial intelligence chatbots (ChatGPT-4 and Copilot) for sharing insights on 3D-printed orthodontic appliances: A cross-sectional study
    Yousuf, Asma Muhammad
    Ikram, Fizzah
    Gulzar, Munnal
    Sukhia, Rashna Hoshang
    Fida, Mubassar
    INTERNATIONAL ORTHODONTICS, 2025, 23 (03)