On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models

被引:1
|
作者
Afshar, Majid [1 ]
Gao, Yanjun [1 ]
Gupta, Deepak [2 ]
Croxford, Emma [1 ]
Demner-Fushman, Dina [2 ]
机构
[1] Univ Wisconsin, Sch Med & Publ Hlth, 750 Highland Ave, Madison, WI 53726 USA
[2] NIH, Natl Lib Med, HHS, 8600 Rockville Pike, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
Artificial intelligence; Knowledge representation (computer); Natural language processing; Unified medical language system; Evaluation methodology; Differential diagnoses;
D O I
10.1016/j.jbi.2024.104707
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models' internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs. Methods: We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation. Results: In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments. Conclusion: We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Integrating Graph Retrieval-Augmented Generation With Large Language Models for Supplier Discovery
    Li, Yunqing
    Ko, Hyunwoong
    Ameri, Farhad
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
  • [42] Beyond Textbooks: A Novel Workflow for Customized Vocabulary Sheet Generation with Large Language Models
    Ngoc-Sang Vo
    Ngoc-Thanh-Xuan Nguyen
    Tan-Phuoc Pham
    Hoang-Anh Pham
    INTELLIGENCE OF THINGS: TECHNOLOGIES AND APPLICATIONS, ICIT 2024, VOL 2, 2025, 230 : 208 - 220
  • [43] Toward automatic generation of control structures for process flow diagrams with large language models
    Hirtreiter, Edwin
    Schulze Balhorn, Lukas
    Schweidtmann, Artur M.
    AICHE JOURNAL, 2024, 70 (01)
  • [44] Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement
    Pinna, Giovanni
    Ravalico, Damiano
    Rovito, Luigi
    Manzoni, Luca
    De Lorenzo, Andrea
    GENETIC PROGRAMMING, EUROGP 2024, 2024, 14631 : 108 - 124
  • [46] The great transformer: Examining the role of large language models in the political economy of AI
    Luitse, Dieuwertje
    Denkena, Wiebke
    BIG DATA & SOCIETY, 2021, 8 (02):
  • [47] Large Language Models and Genomics for Summarizing the Role of microRNA in Regulating mRNA Expression
    Bhasuran, Balu
    Manoharan, Sharanya
    Iyyappan, Oviya Ramalakshmi
    Murugesan, Gurusamy
    Prabahar, Archana
    Raja, Kalpana
    BIOMEDICINES, 2024, 12 (07)
  • [48] Evaluating the role of large language models in inflammatory bowel disease patient information
    Gong, Eun Jeong
    Bang, Chang Seok
    WORLD JOURNAL OF GASTROENTEROLOGY, 2024, 30 (29) : 3538 - 3540
  • [49] The role of artificial intelligence in enhancing CPT coding accuracy for aesthetic plastic surgery: Insight into large language models
    Isch, Emily L.
    Sambangi, Abhijeet
    Somers, Sydney
    Self, D. Mitchell
    Kim, David
    Mcmahon, Heather
    Newman, Andrew
    Jenkins, Matthew
    JOURNAL OF PLASTIC RECONSTRUCTIVE AND AESTHETIC SURGERY, 2025, 103 : 226 - 228
  • [50] Bridging the Gap Between Urological Research and Patient Understanding: The Role of Large Language Models in Automated Generation of Layperson's Summaries Reply
    Eppler, Michael B.
    Ganjavi, Conner
    Knudsen, J. Everett
    Davis, Ryan J.
    Ayo-Ajibola, Oluwatobiloba
    Desai, Aditya
    Ramacciotti, Lorenzo Storino
    Chen, Andrew
    Abreu, Andre De Castro
    Desai, Mihir M.
    Gill, Inderbir S.
    Cacciamani, Giovanni E.
    UROLOGY PRACTICE, 2023, 10 (05) : 445 - 445