Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review

被引:1
作者
Chen, David [1 ]
Alnassar, Saif Addeen [2 ]
Avison, Kate Elizabeth [2 ]
Huang, Ryan S. [1 ]
Raman, Srinivas [3 ]
机构
[1] Univ Toronto, Temerty Fac Med, Toronto, ON, Canada
[2] Univ Waterloo, Dept Syst Design Engn, Waterloo, ON, Canada
[3] BC Canc Vancouver, Dept Radiat Oncol, 600 W 10th Ave, Vancouver, BC V5Z 4E6, Canada
关键词
artificial intelligence; chatbot; data extraction; AI; conversational agent; health information; oncology; scoping review; natural language processing; NLP; large language model; LLM; digital health; health technology; electronic health record; CLINICAL INFORMATION;
D O I
10.2196/65984
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background: Natural language processing systems for data extraction from unstructured clinical text require expert-driven input for labeled annotations and model training. The natural language processing competency of large language models (LLM) can enable automated data extraction of important patient characteristics from electronic health records, which is useful for accelerating cancer clinical research and informing oncology care. Objective: This scoping review aims to map the current landscape, including definitions, frameworks, and future directions of LLMs applied to data extraction from clinical text in oncology. Methods: We queried Ovid MEDLINE for primary, peer-reviewed research studies published since 2000 on June 2, 2024, using oncology-and LLM-related keywords. This scoping review included studies that evaluated the performance of an LLM applied to data extraction from clinical text in oncology contexts. Study attributes and main outcomes were extracted to outline key trends of research in LLM-based data extraction. Results: The literature search yielded 24 studies for inclusion. The majority of studies assessed original and fine-tuned variants of the BERT LLM (n=18, 75%) followed by the Chat-GPT conversational LLM (n=6, 25%). LLMs for data extraction were commonly applied in pan-cancer clinical settings (n=11, 46%), followed by breast (n=4, 17%), and lung (n=4, 17%) cancer contexts, and were evaluated using multi-institution datasets (n=18, 75%). Comparing the studies published in 2022- 2024 versus 2019-2021, both the total number of studies (18 vs 6) and the proportion of studies using prompt engineering increased (5/18, 28% vs 0/6, 0%), while the proportion using fine-tuning decreased (8/18, 44.4% vs 6/6, 100%). Advantages of LLMs included positive data extraction performance and reduced manual workload. Conclusions: LLMs applied to data extraction in oncology can serve as useful automated tools to reduce the administrative burden of reviewing patient health records and increase time for patient-facing care. Recent advances in prompt-engineering and fine-tuning methods, and multimodal data extraction present promising directions for future research. Further studies are needed to evaluate the performance of LLM-enabled data extraction in clinical domains beyond the training dataset and to assess the scope and integration of LLMs into real-world clinical environments.
引用
收藏
页数:12
相关论文
共 51 条
[1]  
Arya Ankur, 2024, Cancer Res Commun, V4, P1041, DOI 10.1158/2767-9764.CRC-24-0064
[2]   Multimodal LLMs for Health Grounded in Individual-Specific Data [J].
Belyaeva, Anastasiya ;
Cosentino, Justin ;
Hormozdiari, Farhad ;
Eswaran, Krish ;
Shetty, Shravya ;
Corrado, Greg ;
Carroll, Andrew ;
McLean, Cory Y. ;
Furlotte, Nicholas A. .
MACHINE LEARNING FOR MULTIMODAL HEALTHCARE DATA, ML4MHD 2023, 2024, 14315 :86-102
[3]   Harnessing multimodal data integration to advance precision oncology [J].
Boehm, Kevin M. ;
Khosravi, Pegah ;
Vanguri, Rami ;
Gao, Jianjiong ;
Shah, Sohrab P. .
NATURE REVIEWS CANCER, 2022, 22 (02) :114-126
[4]   Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases [J].
Chen, David ;
Huang, Ryan S. ;
Jomy, Jane ;
Wong, Philip ;
Yan, Michael ;
Croke, Jennifer ;
Tong, Daniel ;
Hope, Andrew ;
Eng, Lawson ;
Raman, Srinivas .
JAMA NETWORK OPEN, 2024, 7 (10) :e2437711
[5]   Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media [J].
Chen, David ;
Parsa, Rod ;
Hope, Andrew ;
Hannon, Breffni ;
Mak, Ernie ;
Eng, Lawson ;
Liu, Fei-Fei ;
Fallah-Rad, Nazanin ;
Heesters, Ann M. ;
Raman, Srinivas .
JAMA ONCOLOGY, 2024, 10 (07) :956-960
[6]   Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy [J].
Chen, Shan ;
Guevara, Marco ;
Ramirez, Nicolas ;
Murray, Arpi ;
Warner, Jeremy L. ;
Aerts, Hugo J. W. L. ;
Miller, Timothy A. ;
Savova, Guergana K. ;
Mak, Raymond H. ;
Bitterman, Danielle S. .
JCO CLINICAL CANCER INFORMATICS, 2023, 7
[7]   Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer [J].
Choi, Hyeon Seok ;
Song, Jun Yeong ;
Shin, Kyung Hwan ;
Chang, Ji Hyun ;
Jang, Bum-Sup .
RADIATION ONCOLOGY JOURNAL, 2023, 41 (03) :209-216
[8]   Deep Convolutional Neural Networks for breast cancer screening [J].
Chougrad, Hiba ;
Zouaki, Hamid ;
Alheyane, Omar .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 157 :19-30
[9]  
Davenport Thomas, 2019, Future Healthc J, V6, P94, DOI 10.7861/futurehosp.6-2-94