Use of SNOMED CT in Large Language Models: Scoping Review

被引：0

作者：

Chang, Eunsuk ^{[1
]}

Sung, Sumi ^{[2
]}

机构：

[1] Republ Korea Air Force Aerosp Med Ctr, Cheongju, South Korea

[2] Chungbuk Natl Univ, Res Inst Nursing Sci, Dept Nursing Sci, 1 Chungdae Ro, Cheongju 28644, South Korea

来源：

JMIR MEDICAL INFORMATICS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

SNOMED CT; ontology; knowledge graph; large language models; natural language processing; language models; SIEVE;

D O I：

10.2196/62924

中图分类号：

R-058 [];

学科分类号：

摘要：

Background: Large language models (LLMs) have substantially advanced natural language processing (NLP) capabilities butoften struggle with knowledge-driven tasks in specialized domains such as biomedicine. Integrating biomedical knowledge sourcessuch as SNOMED CT into LLMs may enhance their performance on biomedical tasks. However, the methodologies andeffectiveness of incorporating SNOMED CT into LLMs have not been systematically reviewed.Objective: This scoping review aims to examine how SNOMED CT is integrated into LLMs, focusing on (1) the types andcomponents of LLMs being integrated with SNOMED CT, (2) which contents of SNOMED CT are being integrated, and (3)whether this integration improves LLM performance on NLP tasks.Methods: Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension forScoping Reviews) guidelines, we searched ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase forrelevant studies published from 2018 to 2023. Studies were included if they incorporated SNOMED CT into LLM pipelines fornatural language understanding or generation tasks. Data on LLM types, SNOMED CT integration methods, end tasks, andperformance metrics were extracted and synthesized.Results: The review included 37 studies. Bidirectional Encoder Representations from Transformers and its biomedical variantswere the most commonly used LLMs. Three main approaches for integrating SNOMED CT were identified: (1) incorporatingSNOMED CT into LLM inputs (28/37, 76%), primarily using concept descriptions to expand training corpora; (2) integratingSNOMED CT into additional fusion modules (5/37, 14%); and (3) using SNOMED CT as an external knowledge retriever duringinference (5/37, 14%). The most frequent end task was medical concept normalization (15/37, 41%), followed by entity extractionor typing and classification. While most studies (17/19, 89%) reported performance improvements after SNOMED CT integration,only a small fraction (19/37, 51%) provided direct comparisons. The reported gains varied widely across different metrics andtasks, ranging from 0.87% to 131.66%. However, some studies showed either no improvement or a decline in certain performancemetrics.Conclusions: This review demonstrates diverse approaches for integrating SNOMED CT into LLMs, with a focus on usingconcept descriptions to enhance biomedical language understanding and generation. While the results suggest potential benefitsof SNOMED CT integration, the lack of standardized evaluation methods and comprehensive performance reporting hindersdefinitive conclusions about its effectiveness. Future research should prioritize consistent reporting of performance comparisonsand explore more sophisticated methods for incorporating SNOMED CT's relational structure into LLMs. In addition, thebiomedical NLP community should develop standardized evaluation frameworks to better assess the impact of ontology integrationon LLM performance.

引用

页数：20

共 86 条

[1] Alsentzer E, 2019, P 2 CLIN NAT LANG PR, DOI [10.18653/v1/w19-190962, DOI 10.18653/V1/W19-190962]
[2] MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding
Anand, Vishal
Ramesh, Raksha
Jin, Boshen
Wang, Ziyin
Lei, Xiaoxiao
Lin, Ching-Yung
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4868 - 4872
[3] Archer P, 2022, Informatik Spektrum, V46, P65, DOI [10.1007/S00287-022-01513-984, DOI 10.1007/S00287-022-01513-984]
[4] Aronson AR, 2001, J AM MED INFORM ASSN, P17
[5] Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, 10.48550/arXiv.2005.14165]
[6] Barrault L, 2020, P 5 C MACH TRANSL 20, DOI [10.18653/v1/w19-530177, DOI 10.18653/V1/W19-530177]
[7] Beam AL, 2020, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020, P295
[8] The publishing delay in scholarly peer-reviewed journals
Bjork, Bo-Christer
Solomon, David
[J]. JOURNAL OF INFORMETRICS, 2013, 7 (04) : 914 - 923
[9] The Unified Medical Language System (UMLS): integrating biomedical terminology
Bodenreider, O
[J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D267 - D270
[10] Borchert F, 2022, P C LABS EV FOR 2022, P48

← 1 2 3 4 5 6 7 8 9 →