On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models

被引:1
|
作者
Afshar, Majid [1 ]
Gao, Yanjun [1 ]
Gupta, Deepak [2 ]
Croxford, Emma [1 ]
Demner-Fushman, Dina [2 ]
机构
[1] Univ Wisconsin, Sch Med & Publ Hlth, 750 Highland Ave, Madison, WI 53726 USA
[2] NIH, Natl Lib Med, HHS, 8600 Rockville Pike, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
Artificial intelligence; Knowledge representation (computer); Natural language processing; Unified medical language system; Evaluation methodology; Differential diagnoses;
D O I
10.1016/j.jbi.2024.104707
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models' internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs. Methods: We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation. Results: In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments. Conclusion: We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Perspective: Large Language Models in Applied Mechanics
    Brodnik, Neal R.
    Carton, Samuel
    Muir, Caelin
    Ghosh, Satanu
    Downey, Doug
    Echlin, McLean P.
    Pollock, Tresa M.
    Daly, Samantha
    JOURNAL OF APPLIED MECHANICS-TRANSACTIONS OF THE ASME, 2023, 90 (10):
  • [22] Performance and Accuracy Research of the Large Language Models
    Gaitan, Nicoleta Cristina
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (08) : 62 - 69
  • [23] Large Language Models Meet Next-Generation Networking Technologies: A Review
    Hang, Ching-Nam
    Yu, Pei-Duo
    Morabito, Roberto
    Tan, Chee-Wei
    FUTURE INTERNET, 2024, 16 (10)
  • [24] The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review
    Preiksaitis, Carl
    Ashenburg, Nicholas
    Bunney, Gabrielle
    Chu, Andrew
    Kabeer, Rana
    Riley, Fran
    Ribeira, Ryan
    Rose, Christian
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [25] Large Language Models in der WissenschaftLarge language models in science
    Karl-Friedrich Kowalewski
    Severin Rodler
    Die Urologie, 2024, 63 (9) : 860 - 866
  • [26] Utilizing natural language processing and large language models in the diagnosis and prediction of infectious diseases: A systematic review
    Omar, Mahmud
    Brin, Dana
    Glicksberg, Benjamin
    Klang, Eyal
    AMERICAN JOURNAL OF INFECTION CONTROL, 2024, 52 (09) : 992 - 1001
  • [27] Investigating the role of large language models on questions about refractive surgery
    Demir, Suleyman
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 195
  • [28] The emergent role of artificial intelligence, natural learning processing, and large language models in higher education and research
    Alqahtani, Tariq
    Badreldin, Hisham A.
    Alrashed, Mohammed
    Alshaya, Abdulrahman I.
    Alghamdi, Sahar S.
    bin Saleh, Khalid
    Alowais, Shuroug A.
    Alshaya, Omar A.
    Rahman, Ishrat
    Al Yami, Majed S.
    Albekairy, Abdulkareem M.
    RESEARCH IN SOCIAL & ADMINISTRATIVE PHARMACY, 2023, 19 (08) : 1236 - 1242
  • [29] The Role of Humanization and Robustness of Large Language Models in Conversational Artificial Intelligence for Individuals With Depression: A Critical Analysis
    Ferrario, Andrea
    Sedlakova, Jana
    Trachsel, Manuel
    JMIR MENTAL HEALTH, 2024, 11
  • [30] Large Language Models in Orthopaedics
    Yao, Jie J.
    Aggarwal, Manan
    Lopez, Ryan D.
    Namdari, Surena
    JOURNAL OF BONE AND JOINT SURGERY-AMERICAN VOLUME, 2024, 106 (15) : 1411 - 1418