Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology

被引:37
作者
Huang, Yixing [1 ,2 ]
Gomaa, Ahmed [1 ,2 ]
Semrau, Sabine [1 ,2 ]
Haderlein, Marlen [1 ,2 ]
Lettmaier, Sebastian [1 ,2 ]
Weissmann, Thomas [1 ,2 ]
Grigo, Johanna [1 ,2 ]
Tkhayat, Hassen Ben [1 ,3 ]
Frey, Benjamin [1 ,2 ]
Gaipl, Udo [1 ,2 ]
Distel, Luitpold [1 ,2 ]
Maier, Andreas [3 ]
Fietkau, Rainer [1 ,2 ]
Bert, Christoph [1 ,2 ]
Putz, Florian [1 ,2 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Univ Hosp Erlangen, Dept Radiat Oncol, Erlangen, Germany
[2] Comprehens Canc Ctr Erlangen EMN CCC ER EMN, Erlangen, Germany
[3] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Erlangen, Germany
来源
FRONTIERS IN ONCOLOGY | 2023年 / 13卷
关键词
large language model; radiotherapy; natural language processing; artificial intelligence; Gray Zone; clinical decision support (CDS); CANCER; THERAPY;
D O I
10.3389/fonc.2023.1265024
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PurposeThe potential of large language models in medicine for education and decision-making purposes has been demonstrated as they have achieved decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. This work aims to evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology.MethodsThe 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases are used to benchmark the performance of ChatGPT-4. The TXIT exam contains 300 questions covering various topics of radiation oncology. The 2022 Gray Zone collection contains 15 complex clinical cases.ResultsFor the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 62.05% and 78.77%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts.ConclusionBoth evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Owing to the risk of hallucinations, it is essential to verify the content generated by models such as ChatGPT for accuracy.
引用
收藏
页数:13
相关论文
共 53 条
  • [1] Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, 10.48550/arXiv.2303.08774]
  • [2] A Viewpoint on Isolated Contralateral Axillary Lymph Node Involvement by Breast Cancer: Regional Recurrence or Distant Metastasis?
    Al-Rashdan, Abdulla
    Cao, Jeffery
    [J]. INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2022, 113 (03): : 489 - 489
  • [3] Alsentzer E., 2019, P CLIN NATURAL LANGU, DOI [DOI 10.18653/V1/W19-1909, 10.18653]
  • [4] The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more "personalized" approach to cancer staging
    Amin, Mahul B.
    Greene, Frederick L.
    Edge, Stephen B.
    Compton, Carolyn C.
    Gershenwald, Jeffrey E.
    Brookland, Robert K.
    Meyer, Laura
    Gress, Donna M.
    Byrd, David R.
    Winchester, David P.
    [J]. CA-A CANCER JOURNAL FOR CLINICIANS, 2017, 67 (02) : 93 - 99
  • [5] Large language models and the perils of their hallucinations
    Azamfirei, Razvan
    Kudchadkar, Sapna R.
    Fackler, James
    [J]. CRITICAL CARE, 2023, 27 (01)
  • [6] Postoperative Radiation Therapy in Prostate Cancer: Timing, Duration of Hormonal Treatment and the Use of PSMA PET-CT
    Berghen, Charlien
    De Meerleer, Gert
    [J]. INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2022, 113 (02): : 252 - 253
  • [7] Brown TB, 2020, ADV NEUR IN, V33
  • [8] Chemotherapy versus chemoradiotherapy after surgery and preoperative chemotherapy for resectable gastric cancer (CRITICS): an international, open-label, randomised phase 3 trial
    Cats, Annemieke
    Jansen, Edwin P. M.
    van Grieken, Nicole C. T.
    Sikorska, Karolina
    Lind, Pehr
    Nordsmark, Marianne
    Kranenbarg, Elma Meershoek-Klein
    Boot, Henk
    Trip, Anouk K.
    Swellengrebel, H. A. Maurits
    van Laarhoven, Hanneke W. M.
    Putter, Hein
    van Sandick, Johanna W.
    Henegouwen, Mark I. van Berge
    Hartgrink, Henk H.
    van Tinteren, Harm
    van de Velde, Cornelis J. H.
    Verheij, Marcel
    [J]. LANCET ONCOLOGY, 2018, 19 (05) : 616 - 628
  • [9] Bibi ergo sum: the effects of a placebo and contextual alcohol cues on motivation to drink alcohol
    Christiansen, Paul
    Townsend, Gareth
    Knibb, Graeme
    Field, Matt
    [J]. PSYCHOPHARMACOLOGY, 2017, 234 (05) : 827 - 835
  • [10] Adjuvant chemoradiotherapy versus radiotherapy alone in women with high-risk endometrial cancer (PORTEC-3): patterns of recurrence and post-hoc survival analysis of a randomised phase 3 trial
    de Boer, Stephanie M.
    Powell, Melanie E.
    Mileshkin, Linda
    Katsaros, Dionyssios
    Bessette, Paul
    Haie-Meder, Christine
    Ottevanger, Petronella B.
    Ledermann, Jonathan A.
    Khaw, Pearly
    D'Amico, Romerai
    Fyles, Anthony
    Baron, Marie-Helene
    Jurgenliemk-Schulz, Ina M.
    Kitchener, Henry C.
    Nijman, Hans W.
    Wilson, Godfrey
    Brooks, Susan
    Gribaudo, Sergio
    Provencher, Diane
    Hanzen, Chantal
    Kruitwagen, Roy F.
    Smit, Vincent T. H. B. M.
    Singh, Naveena
    Do, Viet
    Lissoni, Andrea
    Nout, Remi A.
    Feeney, Amanda
    Verhoeven-Adema, Karen W.
    Putter, Hein
    Creutzberg, Carien L.
    [J]. LANCET ONCOLOGY, 2019, 20 (09) : 1273 - 1285