Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology

被引:37
作者
Huang, Yixing [1 ,2 ]
Gomaa, Ahmed [1 ,2 ]
Semrau, Sabine [1 ,2 ]
Haderlein, Marlen [1 ,2 ]
Lettmaier, Sebastian [1 ,2 ]
Weissmann, Thomas [1 ,2 ]
Grigo, Johanna [1 ,2 ]
Tkhayat, Hassen Ben [1 ,3 ]
Frey, Benjamin [1 ,2 ]
Gaipl, Udo [1 ,2 ]
Distel, Luitpold [1 ,2 ]
Maier, Andreas [3 ]
Fietkau, Rainer [1 ,2 ]
Bert, Christoph [1 ,2 ]
Putz, Florian [1 ,2 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Univ Hosp Erlangen, Dept Radiat Oncol, Erlangen, Germany
[2] Comprehens Canc Ctr Erlangen EMN CCC ER EMN, Erlangen, Germany
[3] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Erlangen, Germany
来源
FRONTIERS IN ONCOLOGY | 2023年 / 13卷
关键词
large language model; radiotherapy; natural language processing; artificial intelligence; Gray Zone; clinical decision support (CDS); CANCER; THERAPY;
D O I
10.3389/fonc.2023.1265024
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PurposeThe potential of large language models in medicine for education and decision-making purposes has been demonstrated as they have achieved decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. This work aims to evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology.MethodsThe 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases are used to benchmark the performance of ChatGPT-4. The TXIT exam contains 300 questions covering various topics of radiation oncology. The 2022 Gray Zone collection contains 15 complex clinical cases.ResultsFor the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 62.05% and 78.77%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts.ConclusionBoth evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Owing to the risk of hallucinations, it is essential to verify the content generated by models such as ChatGPT for accuracy.
引用
收藏
页数:13
相关论文
共 53 条
  • [21] Dose-effect relationship and risk factors for vaginal stenosis after definitive radio(chemo)therapy with image-guided brachytherapy for locally advanced cervical cancer in the EMBRACE study
    Kirchheiner, Kathrin
    Nout, Remi A.
    Lindegaard, Jacob C.
    Haie-Meder, Christine
    Mahantshetty, Umesh
    Segedin, Barbara
    Jurgenliemk-Schulz, Ina M.
    Hoskin, Peter J.
    Rai, Bhavana
    Dorr, Wolfgang
    Kirisits, Christian
    Bentzen, Soren M.
    Potter, Richard
    Tanderup, Kari
    [J]. RADIOTHERAPY AND ONCOLOGY, 2016, 118 (01) : 160 - 166
  • [22] Kung Tiffany H, 2023, PLOS Digit Health, V2, pe0000198, DOI 10.1371/journal.pdig.0000198
  • [23] BioBERT: a pre-trained biomedical language representation model for biomedical text mining
    Lee, Jinhyuk
    Yoon, Wonjin
    Kim, Sungdong
    Kim, Donghyeon
    Kim, Sunkyu
    So, Chan Ho
    Kang, Jaewoo
    [J]. BIOINFORMATICS, 2020, 36 (04) : 1234 - 1240
  • [24] ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
    Li, Yunxiang
    Li, Zihan
    Zhang, Kai
    Dan, Ruilong
    Jiang, Steve
    Zhang, You
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (06)
  • [25] Using AI-generated suggestions from ChatGPT to optimize clinical decision support
    Liu, Siru
    Wright, Aileen P.
    Patterson, Barron L.
    Wanderer, Jonathan P.
    Turer, Robert W.
    Nelson, Scott D.
    McCoy, Allison B.
    Sittig, Dean F.
    Wright, Adam
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (07) : 1237 - 1245
  • [26] NRG Oncology/NSABP B-51/RTOG 1304: Phase III trial to determine if chest wall and regional nodal radiotherapy (CWRNRT) post mastectomy (Mx) or the addition of RNRT to whole breast RT post breast-conserving surgery (BCS) reduces invasive breast cancer recurrence-free interval (IBCR-FI) in patients (pts) with pathologically positive axillary (PPAx) nodes who are ypN0 after neoadjuvant chemotherapy (NC).
    Mamounas, Eleftherios P.
    Bandos, Hanna
    White, Julia R.
    Julian, Thomas B.
    Khan, Atif J.
    Shaitelman, Simona Flora
    Torres, Mylin Ann
    Vicini, Frank
    Ganz, Patricia A.
    McCloskey, Susan Ann
    Paik, Soonmyung
    Gupta, Nilendu
    Li, X. Allen
    DiCostanzo, Dominic J.
    Curran, Walter John, Jr.
    Wolmark, Norman
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2019, 37 (15)
  • [27] Digital transfer in radiation oncology education for medical students-single-center data and systemic review of the literature
    Oertel, Michael
    Pepper, Niklas Benedikt
    Schmitz, Martina
    Becker, Jan Carl
    Eich, Hans Theodor
    [J]. STRAHLENTHERAPIE UND ONKOLOGIE, 2022, 198 (09) : 765 - 772
  • [28] Open AI, 2023, OpenAI.com, P6
  • [29] Palma David A, 2017, Int J Radiat Oncol Biol Phys, V97, P1, DOI 10.1016/j.ijrobp.2016.11.052
  • [30] Outcomes of Observation vs Stereotactic Ablative Radiation for Oligometastatic Prostate Cancer The ORIOLE Phase 2 Randomized Clinical Trial
    Phillips, Ryan
    Shi, William Yue
    Deek, Matthew
    Radwan, Noura
    Lim, Su Jin
    Antonarakis, Emmanuel S.
    Rowe, Steven P.
    Ross, Ashley E.
    Gorin, Michael A.
    Deville, Curtiland
    Greco, Stephen C.
    Wang, Hailun
    Denmeade, Samuel R.
    Paller, Channing J.
    Dipasquale, Shirl
    DeWeese, Theodore L.
    Song, Daniel Y.
    Wang, Hao
    Carducci, Michael A.
    Pienta, Kenneth J.
    Pomper, Martin G.
    Dicker, Adam P.
    Eisenberger, Mario A.
    Alizadeh, Ash A.
    Diehn, Maximilian
    Tran, Phuoc T.
    [J]. JAMA ONCOLOGY, 2020, 6 (05) : 650 - 659