共 23 条
TeaBERT: An Efficient Knowledge Infused Cross-Lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System
被引:1
|作者:
Chen, Luming
[1
,2
]
Qi, Yifan
[3
,4
]
Wu, Aiping
[3
,4
]
Deng, Lizong
[3
,4
]
Jiang, Taijiao
[1
,2
]
机构:
[1] Guangzhou Natl Lab, Guangzhou 510005, Peoples R China
[2] Guangzhou Med Univ, Guangzhou 510182, Peoples R China
[3] Chinese Acad Med Sci & Peking Union Med Coll, Inst Syst Med, Beijing 100005, Peoples R China
[4] Suzhou Inst Syst Med, Suzhou 215123, Peoples R China
基金:
中国国家自然科学基金;
关键词:
Cross-lingual pre-trained language model;
deep learning;
entity linking;
UMLS;
D O I:
10.1109/JBHI.2023.3315143
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Medical entity normalization is an important task for medical information processing. The Unified Medical Language System (UMLS), a well-developed medical terminology system, is crucial for medical entity normalization. However, the UMLS primarily consists of English medical terms. For languages other than English, such as Chinese, a significant challenge for normalizing medical entities is the lack of robust terminology systems. To address this issue, we propose a translation-enhancing training strategy that incorporates the translation and synonym knowledge of the UMLS into a language model using the contrastive learning approach. In this work, we proposed a cross-lingual pre-trained language model called TeaBERT, which can align synonymous Chinese and English medical entities across languages at the concept level. As the evaluation results showed, the TeaBERT language model outperformed previous cross-lingual language models with Acc@5 values of 92.54%, 87.14% and 84.77% on the ICD10-CN, CHPO and RealWorld-v2 datasets, respectively. It also achieved a new state-of-the-art cross-lingual entity mapping performance without fine-tuning. The translation-enhancing strategy is applicable to other languages that face the similar challenge due to the absence of well-developed medical terminology systems.
引用
收藏
页码:6029 / 6038
页数:10
相关论文