TeaBERT: An Efficient Knowledge Infused Cross-Lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System

被引:1
|
作者
Chen, Luming [1 ,2 ]
Qi, Yifan [3 ,4 ]
Wu, Aiping [3 ,4 ]
Deng, Lizong [3 ,4 ]
Jiang, Taijiao [1 ,2 ]
机构
[1] Guangzhou Natl Lab, Guangzhou 510005, Peoples R China
[2] Guangzhou Med Univ, Guangzhou 510182, Peoples R China
[3] Chinese Acad Med Sci & Peking Union Med Coll, Inst Syst Med, Beijing 100005, Peoples R China
[4] Suzhou Inst Syst Med, Suzhou 215123, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-lingual pre-trained language model; deep learning; entity linking; UMLS;
D O I
10.1109/JBHI.2023.3315143
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Medical entity normalization is an important task for medical information processing. The Unified Medical Language System (UMLS), a well-developed medical terminology system, is crucial for medical entity normalization. However, the UMLS primarily consists of English medical terms. For languages other than English, such as Chinese, a significant challenge for normalizing medical entities is the lack of robust terminology systems. To address this issue, we propose a translation-enhancing training strategy that incorporates the translation and synonym knowledge of the UMLS into a language model using the contrastive learning approach. In this work, we proposed a cross-lingual pre-trained language model called TeaBERT, which can align synonymous Chinese and English medical entities across languages at the concept level. As the evaluation results showed, the TeaBERT language model outperformed previous cross-lingual language models with Acc@5 values of 92.54%, 87.14% and 84.77% on the ICD10-CN, CHPO and RealWorld-v2 datasets, respectively. It also achieved a new state-of-the-art cross-lingual entity mapping performance without fine-tuning. The translation-enhancing strategy is applicable to other languages that face the similar challenge due to the absence of well-developed medical terminology systems.
引用
收藏
页码:6029 / 6038
页数:10
相关论文
共 23 条
  • [1] Cross-lingual Unified Medical Language System entity linking in online health communities
    Bitton, Yonatan
    Cohen, Raphael
    Schifter, Tamar
    Bachmat, Eitan
    Elhadad, Michael
    Elhadad, Noemie
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (10) : 1585 - 1592
  • [2] THE UNIFIED MEDICAL LANGUAGE SYSTEM
    LINDBERG, DAB
    HUMPHREYS, BL
    MCCRAY, AT
    METHODS OF INFORMATION IN MEDICINE, 1993, 32 (04) : 281 - 291
  • [3] The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study
    Rivera Zavala, Renzo
    Martinez, Paloma
    JMIR MEDICAL INFORMATICS, 2020, 8 (12)
  • [4] An analysis on language transfer of pre-trained language model with cross-lingual post-training
    Son, Suhyune
    Park, Chanjun
    Lee, Jungseob
    Shim, Midan
    Lee, Chanhee
    Jang, Yoonna
    Seo, Jaehyung
    Lim, Jungwoo
    Lim, Heuiseok
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 267
  • [5] The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts
    Weinzierl, Maxwell A.
    Maldonado, Ramon
    Harabagiu, Sanda M.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (10) : 1556 - 1567
  • [6] A chemical specialty semantic network for the Unified Medical Language System
    Morrey, C. Paul
    Perl, Yehoshua
    Halper, Michael
    Chen, Ling
    Gu, Huanying Helen
    JOURNAL OF CHEMINFORMATICS, 2012, 4
  • [7] Next Generation Phenotyping Using the Unified Medical Language System
    Adamusiak, Tomasz
    Shimoyama, Naoki
    Shimoyama, Mary
    JMIR MEDICAL INFORMATICS, 2014, 2 (01) : 20 - 33
  • [8] A structural partition of the Unified Medical Language System's Semantic Network
    Chen, Z
    Halper, M
    Geller, J
    Perl, Y
    2000 IEEE EMBS INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY APPLICATIONS IN BIOMEDICINE, PROCEEDINGS, 2000, : 296 - 301
  • [9] Analysis of the Representation of Frequent Clinical Attributes in the Unified Medical Language System
    Guengoer, Baris
    Deppenwiese, Noemi
    Mang, Jonathan M.
    Toddenroth, Dennis
    PHEALTH 2022, 2022, 299 : 217 - 222
  • [10] An efficient modular framework for automatic LIONC classification of MedIMG using unified medical language
    Bhatia, Surbhi
    Alojail, Mohammed
    Sengan, Sudhakar
    Dadheech, Pankaj
    FRONTIERS IN PUBLIC HEALTH, 2022, 10