On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models

被引:1
|
作者
Afshar, Majid [1 ]
Gao, Yanjun [1 ]
Gupta, Deepak [2 ]
Croxford, Emma [1 ]
Demner-Fushman, Dina [2 ]
机构
[1] Univ Wisconsin, Sch Med & Publ Hlth, 750 Highland Ave, Madison, WI 53726 USA
[2] NIH, Natl Lib Med, HHS, 8600 Rockville Pike, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
Artificial intelligence; Knowledge representation (computer); Natural language processing; Unified medical language system; Evaluation methodology; Differential diagnoses;
D O I
10.1016/j.jbi.2024.104707
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models' internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs. Methods: We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation. Results: In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments. Conclusion: We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Large language models in science
    Kowalewski, Karl-Friedrich
    Rodler, Severin
    UROLOGIE, 2024, 63 (09): : 860 - 866
  • [32] Frontiers: Supporting Content Marketing with Natural Language Generation
    Reisenbichler, Martin
    Reutterer, Thomas
    Schweidel, David A.
    Dan, Daniel
    MARKETING SCIENCE, 2022, 41 (03) : 441 - 452
  • [33] Large language models (LLMs) as agents for augmented democracy
    Gudino, Jairo F.
    Grandi, Umberto
    Hidalgo, Cesar
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2024, 382 (2285):
  • [34] A Critical Review of Methods and Challenges in Large Language Models
    Moradi, Milad
    Yan, Ke
    Colwell, David
    Samwald, Matthias
    Asgari, Rhona
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (02): : 1681 - 1698
  • [35] Large Language Models in Oncology: Revolution or Cause for Concern?
    Caglayan, Aydin
    Slusarczyk, Wojciech
    Rabbani, Rukhshana Dina
    Ghose, Aruni
    Papadopoulos, Vasileios
    Boussios, Stergios
    CURRENT ONCOLOGY, 2024, 31 (04) : 1817 - 1830
  • [36] Einsatzmöglichkeiten von „large language models“ in der OnkologieApplications of large language models in oncology
    Chiara M. Loeffler
    Keno K. Bressem
    Daniel Truhn
    Die Onkologie, 2024, 30 (5) : 388 - 393
  • [37] Transforming Informed Consent Generation Using Large Language Models: Mixed Methods Study
    Shi, Qiming
    Luzuriaga, Katherine
    Allison, Jeroan J.
    Oztekin, Asil
    Faro, Jamie M.
    Lee, Joy L.
    Hafer, Nathaniel
    Mcmanus, Margaret
    Zai, Adrian H.
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [38] Distractor Generation for Multiple-Choice Questions with Predictive Prompting and Large Language Models
    Bitew, Semere Kiros
    Deleu, Johannes
    Develder, Chris
    Demeester, Thomas
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II, 2025, 2134 : 48 - 63
  • [39] A Survey Study on the State of the Art of Programming Exercise Generation using Large Language Models
    Frankford, Eduard
    Hoehn, Ingo
    Sauerwein, Clemens
    Breu, Ruth
    2024 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING EDUCATION AND TRAINING, CSEE & T 2024, 2024,
  • [40] Enhancing textual textbook question answering with large language models and retrieval augmented generation
    Alawwad, Hessa A.
    Alhothali, Areej
    Naseem, Usman
    Alkhathlan, Ali
    Jamal, Amani
    PATTERN RECOGNITION, 2025, 162