BioBridge: Unified Bio-Embedding With Bridging Modality in Code-Switched EMR

被引：0

作者：

Jeon, Jangyeong ^{[1
]}

Cho, Sangyeon ^{[1
]}

Lee, Dongjoon ^{[1
]}

Lee, Changhee ^{[2
]}

Kim, Junyeong ^{[1
]}

机构：

[1] Chung Ang Univ, Dept Artificial Intelligence, Seoul 06974, South Korea

[2] Korea Univ, Dept Artificial Intelligence, Seoul 02841, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Natural language processing; Feature extraction; Biological system modeling; Electronic medical records; Training; Encoding; Unified modeling language; Pediatrics; Emergency services; code-switching; electronic medical record; emergency department; pediatric emergency department; LANGUAGE; UMLS;

D O I：

10.1109/ACCESS.2024.3467251

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Pediatric Emergency Department (PED) overcrowding presents a significant global challenge, prompting the need for efficient solutions. This paper introduces the BioBridge framework, a novel approach that applies Natural Language Processing (NLP) to Electronic Medical Records (EMRs) in written free-text form to enhance decision-making in PED. In non-English speaking countries, such as South Korea, EMR data is often written in a Code-Switching(CS) format that mixes the native language with English, with most code-switched English words having clinical significance. The BioBridge framework consists of two core modules: "bridging modality in context" and "unified bio-embedding." The "bridging modality in context" module improves the contextual understanding of bilingual and code-switched EMRs. In the "unified bio-embedding" module, the knowledge of the model trained in the medical domain is injected into the encoder-based model to bridge the gap between the medical and general domains. Experimental results demonstrate that the proposed BioBridge significantly performance traditional machine learning and pre-trained encoder-based models on several metrics, including F1 score, area under the receiver operating characteristic curve (AUROC), area under the precision-recall Curve (AUPRC), and Brier score. Specifically, BioBridge-XLM achieved enhancements of 0.85% in F1 score, 0.75% in AUROC, and 0.76% in AUPRC, along with a notable 3.04% decrease in the Brier score, demonstrating marked improvements in accuracy, reliability, and prediction calibration over the baseline XLM model. The source code will be made publicly available at https://github.com/jjy961228/BioBridge.

引用

页码：141866 / 141877

页数：12