Use of artificial intelligence chatbots in clinical management of immune-related adverse events

被引:9
作者
Burnette, Hannah [1 ]
Pabani, Aliyah [2 ]
von Itzstein, Mitchell S. [3 ]
Switzer, Benjamin [4 ]
Fan, Run [5 ]
Ye, Fei [5 ]
Puzanov, Igor [4 ]
Naidoo, Jarushka [6 ]
Ascierto, Paolo A. [7 ]
Gerber, David E. [3 ]
Ernstoff, Marc S. [8 ]
Johnson, Douglas B. [1 ]
机构
[1] Vanderbilt Univ, Med Ctr, Dept Med, Nashville, TN 37235 USA
[2] Johns Hopkins Univ, Dept Oncol, Baltimore, MD USA
[3] Univ Texas Southwestern Med Ctr, Harold C Simmons Comprehens Canc Ctr, Dallas, TX USA
[4] Roswell Park Comprehens Canc Ctr, Dept Med, Buffalo, NY USA
[5] Vanderbilt Univ, Med Ctr, Dept Biostat, Nashville, TN USA
[6] Beaumont Hosp, RCSI Canc Ctr, Dublin, Ireland
[7] Ist Nazl Tumori IRCCS Fdn Pascale, Dept Melanoma Canc Immunotherapy & Dev Therapeut, Naples, Campania, Italy
[8] NCI, ImmunoOncol Branch IOB, Dev Therapeut Program, Canc Therapy & Diag Div,NIH, Bethesda, MD USA
关键词
Immune Checkpoint Inhibitor; Immune related adverse event - irAE; Thyroiditis; Colitis; Pneumonitis; CHATGPT;
D O I
10.1136/jitc-2023-008599
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background Artificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined. Methods We developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared. Results Overall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1-2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4). Conclusions AI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information ("hallucinations") was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow
引用
收藏
页数:5
相关论文
共 19 条
[1]   Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum [J].
Ayers, John W. ;
Poliak, Adam ;
Dredze, Mark ;
Leas, Eric C. ;
Zhu, Zechariah ;
Kelley, Jessica B. ;
Faix, Dennis J. ;
Goodman, Aaron M. ;
Longhurst, Christopher A. ;
Hogarth, Michael ;
Smith, Davey M. .
JAMA INTERNAL MEDICINE, 2023, 183 (06) :589-596
[2]   Society for Immunotherapy of Cancer (SITC) clinical practice guideline on immune checkpoint inhibitor-related adverse events [J].
Brahmer, Julie R. ;
Abu-Sbeih, Hamzah ;
Ascierto, Paolo Antonio ;
Brufsky, Jill ;
Cappelli, Laura C. ;
Cortazar, Frank B. ;
Gerber, David E. ;
Hamad, Lamya ;
Hansen, Eric ;
Johnson, Douglas B. ;
Lacouture, Mario E. ;
Masters, Gregory A. ;
Naidoo, Jarushka ;
Nanni, Michele ;
Perales, Miguel-Angel ;
Puzanov, Igor ;
Santomasso, Bianca D. ;
Shanbhag, Satish P. ;
Sharma, Rajeev ;
Skondra, Dimitra ;
Sosman, Jeffrey A. ;
Turner, Michelle ;
Ernstoff, Marc S. .
JOURNAL FOR IMMUNOTHERAPY OF CANCER, 2021, 9 (06)
[3]   Management of Immune-Related Adverse Events in Patients Treated With Immune Checkpoint Inhibitor Therapy: American Society of Clinical Oncology Clinical Practice Guideline [J].
Brahmer, Julie R. ;
Lacchetti, Christina ;
Schneider, Bryan J. ;
Atkins, Michael B. ;
Brassil, Kelly J. ;
Caterino, Jeffrey M. ;
Chau, Ian ;
Ernstoff, Marc S. ;
Gardner, Jennifer M. ;
Ginex, Pamela ;
Hallmeyer, Sigrun ;
Chakrabarty, Jennifer Holter ;
Leighl, Natasha B. ;
Mammen, Jennifer S. ;
McDermott, David F. ;
Naing, Aung ;
Nastoupil, Loretta J. ;
Phillips, Tanyanika ;
Porter, Laura D. ;
Puzanov, Igor ;
Reichner, Cristina A. ;
Santomasso, Bianca D. ;
Seigel, Carole ;
Spira, Alexander ;
Suarez-Almazor, Maria E. ;
Wang, Yinghong ;
Weber, Jeffrey S. ;
Wolchok, Jedd D. ;
Thompson, John A. .
JOURNAL OF CLINICAL ONCOLOGY, 2018, 36 (17) :1714-+
[4]   Use of Artificial Intelligence Chatbots for Cancer Treatment Information [J].
Chen, Shan ;
Kann, Benjamin H. ;
Foote, Michael B. ;
Aerts, Hugo J. W. L. ;
Savova, Guergana K. ;
Mak, Raymond H. ;
Bitterman, Danielle S. .
JAMA ONCOLOGY, 2023, 9 (10) :1459-1462
[5]   ChatGPT as a Diagnostic Aid in Alzheimer's Disease: An Exploratory Study [J].
El Haj, Mohamad ;
Boutoleau-Bretonniere, Claire ;
Gallouj, Karim ;
Wagemann, Nathalie ;
Antoine, Pascal ;
Kapogiannis, Dimitrios ;
Chapelet, Guillaume .
JOURNAL OF ALZHEIMERS DISEASE REPORTS, 2024, 8 (01) :495-500
[6]  
El-Metwally Ashraf, 2020, ScientificWorldJournal, V2020, P4790254, DOI 10.1155/2020/4790254
[7]   Accuracy and Reliability of Chatbot Responses to Physician Questions [J].
Goodman, Rachel S. ;
Patrinely, J. Randall ;
Stone, Cosby A. ;
Zimmerman, Eli ;
Donald, Rebecca R. ;
Chang, Sam S. ;
Berkowitz, Sean T. ;
Finn, Avni P. ;
Jahangir, Eiman ;
Scoville, Elizabeth A. ;
Reese, Tyler S. ;
Friedman, Debra L. ;
Bastarache, Julie A. ;
van der Heijden, Yuri F. ;
Wright, Jordan J. ;
Ye, Fei ;
Carter, Nicholas ;
Alexander, Matthew R. ;
Choe, Jennifer H. ;
Chastain, Cody A. ;
Zic, John A. ;
Horst, Sara N. ;
Turker, Isik ;
Agarwal, Rajiv ;
Osmundson, Evan ;
Idrees, Kamran ;
Kiernan, Colleen M. ;
Padmanabhan, Chandrasekhar ;
Bailey, Christina E. ;
Schlegel, Cameron E. ;
Chambless, Lola B. ;
Gibson, Michael K. ;
Osterman, Travis J. ;
Wheless, Lee E. ;
Johnson, Douglas B. .
JAMA NETWORK OPEN, 2023, 6 (10)
[8]   Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study [J].
Iannantuono, Giovanni Maria ;
Bracken-Clarke, Dara ;
Karzai, Fatima ;
Choo-Wosoba, Hyoyoung ;
Gulley, James L. ;
Floudas, Charalampos S. .
ONCOLOGIST, 2024, :407-414
[9]   Immune Checkpoint Inhibitor Toxicity in 2018 [J].
Johnson, Douglas B. ;
Chandra, Sunandana ;
Sosman, Jeffrey A. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 320 (16) :1702-1703
[10]   MEASUREMENT OF OBSERVER AGREEMENT FOR CATEGORICAL DATA [J].
LANDIS, JR ;
KOCH, GG .
BIOMETRICS, 1977, 33 (01) :159-174