Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource

被引:8
作者
Alnazzawi, Noha [1 ]
Thompson, Paul [1 ]
Ananiadou, Sophia [1 ]
机构
[1] Univ Manchester, Natl Ctr Text Min, Manchester Inst Biotechnol, Manchester, Lancs, England
基金
英国医学研究理事会;
关键词
OF-THE-ART; CONCEPT RECOGNITION; CLINICAL TEXT; NORMALIZATION; EXTRACTION; ONTOLOGY; CORPUS; GENES; UMLS; TASK;
D O I
10.1371/journal.pone.0162287
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus D a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm's wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks.
引用
收藏
页数:27
相关论文
共 91 条
[1]   Using text mining techniques to extract phenotypic information from the PhenoCHF corpus [J].
Alnazzawi, Noha ;
Thompson, Paul ;
Batista-Navarro, Riza ;
Ananiadou, Sophia .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2015, 15
[2]  
Alnazzawi Noha., 2014, Proceedings of Louhi, V14, P69, DOI [DOI 10.3115/V1/W14-1110, 10.3115/v1/W14-1110]
[3]  
[Anonymous], P SHARE CLEF EV LAB
[4]  
[Anonymous], P 2009 S LANG BIOL M
[5]  
[Anonymous], P SHARE CLEF EV LAB
[6]  
[Anonymous], P SHARE CLEF EV LAB
[7]  
[Anonymous], 2003, IIWeb, DOI DOI 10.5555/3104278.3104293
[8]   An overview of MetaMap: historical perspective and recent advances [J].
Aronson, Alan R. ;
Lang, Francois-Michel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :229-236
[9]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[10]   Concept recognition for extracting protein interaction relations from biomedical text [J].
Baumgartner, William A., Jr. ;
Lu, Zhiyong ;
Johnson, Helen L. ;
Caporaso, J. Gregory ;
Paquette, Jesse ;
Lindemann, Anna ;
White, Elizabeth K. ;
Medvedeva, Olga ;
Cohen, K. Bretonnel ;
Hunter, Lawrence .
GENOME BIOLOGY, 2008, 9