TeaBERT: An Efficient Knowledge Infused Cross-Lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System

被引:1
|
作者
Chen, Luming [1 ,2 ]
Qi, Yifan [3 ,4 ]
Wu, Aiping [3 ,4 ]
Deng, Lizong [3 ,4 ]
Jiang, Taijiao [1 ,2 ]
机构
[1] Guangzhou Natl Lab, Guangzhou 510005, Peoples R China
[2] Guangzhou Med Univ, Guangzhou 510182, Peoples R China
[3] Chinese Acad Med Sci & Peking Union Med Coll, Inst Syst Med, Beijing 100005, Peoples R China
[4] Suzhou Inst Syst Med, Suzhou 215123, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-lingual pre-trained language model; deep learning; entity linking; UMLS;
D O I
10.1109/JBHI.2023.3315143
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Medical entity normalization is an important task for medical information processing. The Unified Medical Language System (UMLS), a well-developed medical terminology system, is crucial for medical entity normalization. However, the UMLS primarily consists of English medical terms. For languages other than English, such as Chinese, a significant challenge for normalizing medical entities is the lack of robust terminology systems. To address this issue, we propose a translation-enhancing training strategy that incorporates the translation and synonym knowledge of the UMLS into a language model using the contrastive learning approach. In this work, we proposed a cross-lingual pre-trained language model called TeaBERT, which can align synonymous Chinese and English medical entities across languages at the concept level. As the evaluation results showed, the TeaBERT language model outperformed previous cross-lingual language models with Acc@5 values of 92.54%, 87.14% and 84.77% on the ICD10-CN, CHPO and RealWorld-v2 datasets, respectively. It also achieved a new state-of-the-art cross-lingual entity mapping performance without fine-tuning. The translation-enhancing strategy is applicable to other languages that face the similar challenge due to the absence of well-developed medical terminology systems.
引用
收藏
页码:6029 / 6038
页数:10
相关论文
共 23 条
  • [21] Enhancing Collaborative Case Diagnoses Through Unified Medical Language System-Based Disambiguation: A Case Study of the Zika Virus
    Moreira, Albert
    Alonso-Calvo, Raul
    Munoz, Alberto
    Crespo, Jose
    TELEMEDICINE AND E-HEALTH, 2017, 23 (07) : 608 - 614
  • [22] Can Unified Medical Language System-based semantic representation improve automated identification of patient safety incident reports by type and severity?
    Wang, Ying
    Coiera, Enrico
    Magrabi, Farah
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (10) : 1502 - 1509
  • [23] Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization
    Xu, Dongfang
    Gopale, Manoj
    Zhang, Jiacheng
    Brown, Kris
    Begoli, Edmon
    Bethard, Steven
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (10) : 1510 - 1519