Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts

被引:17
作者
Chun, Hong-Woo [1 ]
Tsuruoka, Yoshimasa
Kim, Jin-Dong
Shiba, Rie
Nagata, Naoki
Hishiki, Teruyoshi
Tsujii, Jun'ichi
机构
[1] Univ Tokyo, Dept Comp Sci, Grad Sch Informat Sci & Technol, Tokyo, Japan
[2] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[3] NaCTeM, Manchester, Lancs, England
关键词
Prostate Cancer; Unify Medical Language System; Name Entity Recognition; Candidate Term; Name Entity;
D O I
10.1186/1471-2105-7-S3-S4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Automatic recognition of relations between a specific disease term and its relevant genes or protein terms is an important practice of bioinformatics. Considering the utility of the results of this approach, we identified prostate cancer and gene terms with the ID tags of public biomedical databases. Moreover, considering that genetics experts will use our results, we classified them based on six topics that can be used to analyze the type of prostate cancers, genes, and their relations. Methods: We developed a maximum entropy-based named entity recognizer and a relation recognizer and applied them to a corpus-based approach. We collected prostate cancer-related abstracts from MEDLINE, and constructed an annotated corpus of gene and prostate cancer relations based on six topics by biologists. We used it to train the maximum entropy-based named entity recognizer and relation recognizer. Results: Topic-classified relation recognition achieved 92.1% precision for the relation (an increase of 11.0% from that obtained in a baseline experiment). For all topics, the precision was between 67.6 and 88.1%. Conclusion: A series of experimental results revealed two important findings: a carefully designed relation recognition system using named entity recognition can improve the performance of relation recognition, and topic-classified relation recognition can be effectively addressed through a corpus-based approach using manual annotation and machine learning techniques.
引用
收藏
页数:8
相关论文
共 8 条
[1]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[2]  
Chun Hong-Woo, 2006, Pac Symp Biocomput, P4
[3]   Resolving abbreviations to their senses in Medline [J].
Gaudan, S ;
Kirsch, H ;
Rebholz-Schuhmann, D .
BIOINFORMATICS, 2005, 21 (18) :3658-3664
[4]  
Kim J-D, 2004, P INT JOINT WORKSHOP, P70
[5]  
NINOMIYA T, 2005, P 9 INT WORKSH PARS
[6]  
ROSARIO B, 2004, P ANN M ASS COMP LIN
[7]  
Sang Erik F. Tjong Kim, 2003, P COMP NAT LANG LEAR
[8]  
*U TOK TSUJ GROUP, 2004, ENJ VERS 2 1