Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions

被引：0

作者：

Sarrias, Oskitz Ruiz ^{[1
]}

del Prado, Maria Purificacion Martinez ^{[2
]}

Gonzalez, Maria Angeles Sala ^{[2
]}

Sagarduy, Josune Azcuna ^{[2
]}

Cuesta, Pablo Casado ^{[2
]}

Berjano, Covadonga Figaredo ^{[2
]}

Galve-Calvo, Elena ^{[2
]}

Hernandez, Borja Lopez de San Vicente ^{[2
]}

Lopez-Santillan, Maria ^{[2
]}

Escolastico, Maitane Nuno ^{[2
]}

Togneri, Laura Sanchez ^{[2
]}

Sardina, Laura Sande ^{[2
]}

Hoyos, Maria Teresa Perez ^{[2
]}

Villar, Maria Teresa Abad ^{[2
]}

Zudaire, Maialen Zabalza ^{[1
]}

Beristain, Onintza Sayar ^{[1
]}

机构：

[1] NNBi 2020 SL, Dept Math & Stat, Noain 31110, Navarra, Spain

[2] Basurto Univ Hosp, Med Oncol Serv, OSI Bilbao Basurto, Osakidetza, Bilbao 48013, Biscay, Spain

来源：

CANCERS | 2024年 / 16卷 / 16期

关键词：

Large Language Models; artificial intelligence; oncology; clinical practice; chemotherapy; subjective toxicities; medical oncology; patient monitoring; PATIENT-REPORTED OUTCOMES; CARE;

D O I：

10.3390/cancers16162830

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 ;

摘要：

Simple Summary This study evaluated the ability of Large Language Models (LLMs) to classify subjective toxicities from chemotherapy by comparing them with expert oncologists. Using fictitious cases, it was demonstrated that LLMs can achieve accuracy similar to that of oncologists in general toxicity categories, although they need improvement in specific categories. LLMs show great potential for enhancing patient monitoring and reducing the workload of doctors. Future research should focus on training LLMs specifically for medical tasks and validating these findings with real patients, always ensuring accuracy and ethical data management.Abstract Introduction: Large Language Models (LLMs), such as the GPT model family from OpenAI, have demonstrated transformative potential across various fields, especially in medicine. These models can understand and generate contextual text, adapting to new tasks without specific training. This versatility can revolutionize clinical practices by enhancing documentation, patient interaction, and decision-making processes. In oncology, LLMs offer the potential to significantly improve patient care through the continuous monitoring of chemotherapy-induced toxicities, which is a task that is often unmanageable for human resources alone. However, existing research has not sufficiently explored the accuracy of LLMs in identifying and assessing subjective toxicities based on patient descriptions. This study aims to fill this gap by evaluating the ability of LLMs to accurately classify these toxicities, facilitating personalized and continuous patient care. Methods: This comparative pilot study assessed the ability of an LLM to classify subjective toxicities from chemotherapy. Thirteen oncologists evaluated 30 fictitious cases created using expert knowledge and OpenAI's GPT-4. These evaluations, based on the CTCAE v.5 criteria, were compared to those of a contextualized LLM model. Metrics such as mode and mean of responses were used to gauge consensus. The accuracy of the LLM was analyzed in both general and specific toxicity categories, considering types of errors and false alarms. The study's results are intended to justify further research involving real patients. Results: The study revealed significant variability in oncologists' evaluations due to the lack of interaction with fictitious patients. The LLM model achieved an accuracy of 85.7% in general categories and 64.6% in specific categories using mean evaluations with mild errors at 96.4% and severe errors at 3.6%. False alarms occurred in 3% of cases. When comparing the LLM's performance to that of expert oncologists, individual accuracy ranged from 66.7% to 89.2% for general categories and 57.0% to 76.0% for specific categories. The 95% confidence intervals for the median accuracy of oncologists were 81.9% to 86.9% for general categories and 67.6% to 75.6% for specific categories. These benchmarks highlight the LLM's potential to achieve expert-level performance in classifying chemotherapy-induced toxicities. Discussion: The findings demonstrate that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM achieved 85.7% accuracy in general categories and 64.6% in specific categories. While the model's general category performance falls within expert ranges, specific category accuracy requires improvement. The study's limitations include the use of fictitious cases, lack of patient interaction, and reliance on audio transcriptions. Nevertheless, LLMs show significant potential for enhancing patient monitoring and reducing oncologists' workload. Future research should focus on the specific training of LLMs for medical tasks, conducting studies with real patients, implementing interactive evaluations, expanding sample sizes, and ensuring robustness and generalization in diverse clinical settings. Conclusions: This study concludes that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM's performance in general toxicity categories is within the expert range, but there is room for improvement in specific categories. LLMs have the potential to enhance patient monitoring, enable early interventions, and reduce severe complications, improving care quality and efficiency. Future research should involve specific training of LLMs, validation with real patients, and the incorporation of interactive capabilities for real-time patient interactions. Ethical considerations, including data accuracy, transparency, and privacy, are crucial for the safe integration of LLMs into clinical practice.

引用

页数：15

共 6 条

[1] Leveraging Large Language Models for Web3D: Applications, Challenges, and Future Directions
Tanksale, Vinayak
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 254 - 259
[2] Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Javaid, Shumaila
Khalil, Ruhul Amin
Saeed, Nasir
He, Bin
Alouini, Mohamed-Slim
IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2025, 6 : 399 - 432
[3] Privacy Preservation of Large Language Models in the Metaverse Era: Research Frontiers, Categorical Comparisons, and Future Directions
Huang, Dabin
Ge, Mengyu
Xiang, Kunlan
Zhang, Xiaolei
Yang, Haomiao
INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, 2025, 35 (01)
[4] Development of a Questionnaire for Monitoring Risk Factors for Chemotherapy-induced Nausea and Vomiting - A NOGGO Pilot Study
Rittmeister, Hannah
Oskay-Oezcelik, Guelten
Richter, Rolf
Sehouli, Jalid
Grabowski, Jacek P.
ANTICANCER RESEARCH, 2018, 38 (08) : 4859 - 4864
[5] Impact of modified short-term fasting and its combination with a fasting supportive diet during chemotherapy on the incidence and severity of chemotherapy-induced toxicities in cancer patients - a controlled cross-over pilot study
Stefanie Zorn
Janine Ehret
Rebecca Schäuble
Beate Rautenberg
Gabriele Ihorst
Hartmut Bertz
Paul Urbain
Anna Raynor
BMC Cancer, 20
[6] Impact of modified short-term fasting and its combination with a fasting supportive diet during chemotherapy on the incidence and severity of chemotherapy-induced toxicities in cancer patients - a controlled cross-over pilot study
Zorn, Stefanie
Ehret, Janine
Schaeuble, Rebecca
Rautenberg, Beate
Ihorst, Gabriele
Bertz, Hartmut
Urbain, Paul
Raynor, Anna
BMC CANCER, 2020, 20 (01)

← 1 →