Comparison of MetaMap and cTAKES for entity extraction in clinical notes

被引:47
作者
Reategui, Ruth [1 ,2 ]
Ratte, Sylvie [1 ]
机构
[1] Ecole Technol Super, Montreal, PQ, Canada
[2] Univ Tecn Particular Loja, Loja, Ecuador
关键词
cTAKES; MetaMap; UMLS; Clinical documents; DISEASES; IDENTIFICATION;
D O I
10.1186/s12911-018-0654-2
中图分类号
R-058 [];
学科分类号
摘要
Background: Clinical notes such as discharge summaries have a semi-or unstructured format. These documents contain information about diseases, treatments, drugs, etc. Extracting meaningful information from them becomes challenging due to their narrative format. In this context, we aimed to compare the automatic extraction capacity of medical entities using two tools: MetaMap and cTAKES. Methods: We worked with i2b2 (Informatics for Integrating Biology to the Bedside) Obesity Challenge data. Two experiments were constructed. In the first one, only one UMLS concept related with the diseases annotated was extracted. In the second, some UMLS concepts were aggregated. Results: Results were evaluated with manually annotated medical entities. With the aggregation process the result shows a better improvement. MetaMap had an average of 0.88 in recall, 0.89 in precision, and 0.88 in F-score. With cTAKES, the average of recall, precision and F-score were 0.91, 0.89, and 0.89, respectively. Conclusions: The aggregation of concepts (with similar and different semantic types) was shown to be a good strategy for improving the extraction of medical entities, and automatic aggregation could be considered in future works.
引用
收藏
页数:7
相关论文
共 19 条
[1]   Using text mining techniques to extract phenotypic information from the PhenoCHF corpus [J].
Alnazzawi, Noha ;
Thompson, Paul ;
Batista-Navarro, Riza ;
Ananiadou, Sophia .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2015, 15
[2]   An overview of MetaMap: historical perspective and recent advances [J].
Aronson, Alan R. ;
Lang, Francois-Michel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :229-236
[3]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[4]   Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language [J].
Becker, Matthias ;
Boeckmann, Britta .
HEALTH INFORMATICS MEETS EHEALTH, 2016, 223 :71-76
[5]   Pneumonia identification using statistical feature selection [J].
Bejan, Cosmin Adrian ;
Xia, Fei ;
Vanderwende, Lucy ;
Wurfel, Mark M. ;
Yetisgen-Yildiz, Meliha .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (05) :817-823
[6]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[7]   Attempting to Use MetaMap in Clinical Practice: A Feasibility Study on the Identification of Medical Concepts from Italian Clinical Notes [J].
Chiaramello, Emma ;
Paglialonga, Alessia ;
Pinciroli, Francesco ;
Tognola, Gabriella .
EXPLORING COMPLEXITY IN HEALTH: AN INTERDISCIPLINARY SYSTEMS APPROACH, 2016, 228 :28-32
[8]   Comparison and evaluation of pathway-level aggregation methods of gene expression data [J].
Hwang, Seungwoo .
BMC GENOMICS, 2012, 13
[9]  
Jonnagaddala J, 2016, DATABASE-OXFORD, V2016, P1
[10]   Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives [J].
Kovacevic, Aleksandar ;
Dehghan, Azad ;
Filannino, Michele ;
Keane, John A. ;
Nenadic, Goran .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) :859-866