TeaBERT: An Efficient Knowledge Infused Cross-Lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System

被引：1

作者：

Chen, Luming ^{[1
,2
]}

Qi, Yifan ^{[3
,4
]}

Wu, Aiping ^{[3
,4
]}

Deng, Lizong ^{[3
,4
]}

Jiang, Taijiao ^{[1
,2
]}

机构：

[1] Guangzhou Natl Lab, Guangzhou 510005, Peoples R China

[2] Guangzhou Med Univ, Guangzhou 510182, Peoples R China

[3] Chinese Acad Med Sci & Peking Union Med Coll, Inst Syst Med, Beijing 100005, Peoples R China

[4] Suzhou Inst Syst Med, Suzhou 215123, Peoples R China

来源：

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS | 2023年 / 27卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Cross-lingual pre-trained language model; deep learning; entity linking; UMLS;

D O I：

10.1109/JBHI.2023.3315143

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Medical entity normalization is an important task for medical information processing. The Unified Medical Language System (UMLS), a well-developed medical terminology system, is crucial for medical entity normalization. However, the UMLS primarily consists of English medical terms. For languages other than English, such as Chinese, a significant challenge for normalizing medical entities is the lack of robust terminology systems. To address this issue, we propose a translation-enhancing training strategy that incorporates the translation and synonym knowledge of the UMLS into a language model using the contrastive learning approach. In this work, we proposed a cross-lingual pre-trained language model called TeaBERT, which can align synonymous Chinese and English medical entities across languages at the concept level. As the evaluation results showed, the TeaBERT language model outperformed previous cross-lingual language models with Acc@5 values of 92.54%, 87.14% and 84.77% on the ICD10-CN, CHPO and RealWorld-v2 datasets, respectively. It also achieved a new state-of-the-art cross-lingual entity mapping performance without fine-tuning. The translation-enhancing strategy is applicable to other languages that face the similar challenge due to the absence of well-developed medical terminology systems.

引用

页码：6029 / 6038

页数：10

共 23 条

[21] Enhancing Collaborative Case Diagnoses Through Unified Medical Language System-Based Disambiguation: A Case Study of the Zika Virus
Moreira, Albert
Alonso-Calvo, Raul
Munoz, Alberto
Crespo, Jose
TELEMEDICINE AND E-HEALTH, 2017, 23 (07) : 608 - 614
[22] Can Unified Medical Language System-based semantic representation improve automated identification of patient safety incident reports by type and severity?
Wang, Ying
Coiera, Enrico
Magrabi, Farah
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (10) : 1502 - 1509
[23] Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization
Xu, Dongfang
Gopale, Manoj
Zhang, Jiacheng
Brown, Kris
Begoli, Edmon
Bethard, Steven
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (10) : 1510 - 1519

← 1 2 3 →