Identification of new drug classification terms in textual resources

被引:29
作者
Kolarik, Corinna [1 ]
Hofmann-Apitius, Martin
Zimmermann, Marc
Fluck, Juliane
机构
[1] Fraunhofer Inst SCAI, D-53754 St Augustin, Germany
[2] Univ Bonn, Bonn Aachen Int Ctr Informat Technol B IT, D-53113 Bonn, Germany
关键词
D O I
10.1093/bioinformatics/btm196
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge about biological effects of small molecules helps in the understanding of biological processes and supports the development of new therapeutic agents. DrugBank is a high quality database providing such information about drugs that contains annotation of drug effects and classification of therapeutic effects. However, to broaden the scope of such a database in classifying and annotating drugs, systems for automatic extraction of classification terms and the corresponding annotation of drugs are needed. We have developed an approach for the identification of new terms used in unstructured text that provide information about drug properties. It is based on the identification and extraction of phrases corresponding to lexico-syntactic patterns - so-called Hearst patterns that contain drug names and directly related drug annotation terms. Such phrases could be identified with a high performance in DrugBank text (0.89 F-score) and in Medline abstracts ( 0.83 F-score). In comparison to DrugBank annotation terminology, a huge amount of new drug annotation terms could be found. The evaluation of terms extracted from Medline showed that 29-53% of them are new valid drug property terms. They could be assigned to existing and new drug property classes not provided by the DrugBank drug annotation. We come to the conclusion that our system can support database content update by providing additionally drug descriptions of pharmacological effects not yet found in databases like DrugBank. Moreover, we propose that automatic normalization of terms improves the annotation and the retrieval of relevant database entries.
引用
收藏
页码:I264 / I272
页数:9
相关论文
共 18 条
[1]   Introduction: named entity recognition in biomedicine [J].
Ananiadou, S ;
Friedman, C ;
Tsujii, J .
JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) :393-395
[2]  
BLASCHKE C, 2006, TEXT MINING BIOL BIO, P213
[3]  
Chun Hong-Woo, 2006, Pac Symp Biocomput, P4
[4]  
CIMIANO P, 2005, ONTOLOGY LEARNING TE, P59
[5]   ProMiner: rule-based protein and gene entity recognition [J].
Hanisch, D ;
Fundel, K ;
Mevissen, HT ;
Zimmer, R ;
Fluck, J .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[6]  
HEARST MA, 1992, P 14 INT C COMP LING, V539
[7]   Term identification in the biomedical literature [J].
Krauthammer, M ;
Nenadic, G .
JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) :512-526
[8]  
Nenadic G., 2004, P 20 INT C COMP LING
[9]   ALIBABA: PubMed as a graph [J].
Plake, Conrad ;
Schiemann, Torsten ;
Pankalla, Marcus ;
Hakenberg, Joerg ;
Leser, Ulf .
BIOINFORMATICS, 2006, 22 (19) :2444-2445
[10]   The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text [J].
Rindflesch, TC ;
Fiszman, M .
JOURNAL OF BIOMEDICAL INFORMATICS, 2003, 36 (06) :462-477