Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary

被引:5
作者
Xu, Yan [1 ,2 ]
Chen, Luoxin [1 ]
Wei, Junsheng [1 ]
Ananiadou, Sophia [3 ]
Fan, Yubo [1 ]
Qian, Yi [4 ]
Chang, Eric I-Chao [2 ]
Tsujii, Junichi [2 ]
机构
[1] Beihang Univ, Key Lab Biomech & Mechanobiol, State Key Lab Software Dev Environm, Minist Educ, Beijing 100191, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Univ Manchester, Sch Comp Sci, Natl Ctr Text Min, Manchester, Lancs, England
[4] Jinhua Peoples Hosp, Jinhua, Peoples R China
来源
BMC BIOINFORMATICS | 2015年 / 16卷
基金
英国医学研究理事会; 美国国家科学基金会;
关键词
EXTRACTION;
D O I
10.1186/s12859-015-0606-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment. We make efforts to extract a bilingual lexicon from English and Chinese discharge summaries with a small seed lexicon. The lexical terms can be classified into two categories: single-word terms (SWTs) and multi-word terms (MWTs). For SWTs, we use a label propagation (LP; context-based) method to extract candidates of translation pairs. For MWTs, which are pervasive in the medical domain, we propose a term alignment method, which firstly obtains translation candidates for each component word of a Chinese MWT, and then generates their combinations, from which the system selects a set of plausible translation candidates. Results: We compare our LP method with a baseline method based on simple context-similarity. The LP based method outperforms the baseline with the accuracies: 4.44% Acc1, 24.44% Acc10, and 62.22% Acc100, where AccN means the top N accuracy. The accuracy of the LP method drops to 5.41% Acc10 and 8.11% Acc20 for MWTs. Our experiments show that the method based on term alignment improves the performance for MWTs to 16.22% Acc10 and 27.03% Acc20. Conclusions: We constructed a framework for building an English-Chinese term dictionary from discharge summaries in the two languages. Our experiments have shown that the LP-based method augmented with the term alignment method will contribute to reduction of manual work required to compile a bilingual sydictionary of clinical terms.
引用
收藏
页数:10
相关论文
共 25 条
  • [1] Andrade D., 2010, Proceedings of the 23rd International Conference on Computational Linguistics, P19
  • [2] [Anonymous], 2002, P 8 ACM SIGKDD INT C
  • [3] Fung P, 1998, LECT NOTES ARTIF INT, V1529, P1
  • [4] Fung P., 1995, P 33 ANN M ASS COMP, P236, DOI [10.3115/981658.981690, DOI 10.3115/981658.981690]
  • [5] Fung Pascal., 1997, Proceedings of the Fifth Annual Workshop on Very Large Corpora, P192
  • [6] Fung Pascale., 1998, P COLING ACL98, P414
  • [7] Garera Nikesh., 2009, Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL '09, P129, DOI [10.3115/1596374.1596397, DOI 10.3115/1596374.1596397]
  • [8] Ismail A., 2010, Coling, P481
  • [9] Kaji H, 2012, P 5 WORKSH BUILD US, P134
  • [10] Kontonatsios G., 2014, Conference on Empirical Methods in Natural Language Processing (EMNLP' 2014), P1701