Cross-lingual Unified Medical Language System entity linking in online health communities

被引:4
作者
Bitton, Yonatan [1 ]
Cohen, Raphael [1 ]
Schifter, Tamar [2 ]
Bachmat, Eitan [1 ]
Elhadad, Michael [1 ]
Elhadad, Noemie [3 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, Beer Sheva, Israel
[2] Gertner Inst Epidemiol & Hlth Policy Res, Tel Hashomer, Israel
[3] Columbia Univ, Dept Biomed Informat, New York, NY USA
关键词
UMLS; natural language processing; online health communities; UMLS; TEXT;
D O I
10.1093/jamia/ocaa150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. Materials and Methods: We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. Results: We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. Conclusions: Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).
引用
收藏
页码:1585 / 1592
页数:8
相关论文
共 25 条
  • [1] Aronson AR, 2001, J AM MED INFORM ASSN, P17
  • [2] Exploring semantic groups through visual approaches
    Bodenreider, O
    McCray, AT
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2003, 36 (06) : 414 - 432
  • [3] The Unified Medical Language System (UMLS): integrating biomedical terminology
    Bodenreider, O
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D267 - D270
  • [4] Chen N, 2018, NAMED ENTITIES, P55
  • [5] Civan Andrea, 2007, AMIA Annu Symp Proc, P140
  • [6] Cohen R, 2011, P MACH TRANSL MORPH
  • [7] Cohen R, 2019, ARXIV191112022
  • [8] Analysis of Free Online Physician Advice Services
    Cohen, Raphael
    Elhadad, Michael
    Birk, Ohad
    [J]. PLOS ONE, 2013, 8 (03):
  • [9] Cotik V, 2017, J KING SAUD UNIV-COM, V29, P204, DOI 10.1016/j.jksuci.2016.10.004
  • [10] Demner-Fushman D, 2016, Yearb Med Inform, P224