RelEx -: Relation extraction using dependency parse trees

被引:339
作者
Fundel, Katrin [1 ]
Kueffner, Robert [1 ]
Zimmer, Ralf [1 ]
机构
[1] Univ Munich, Inst Informat, D-80333 Munich, Germany
关键词
D O I
10.1093/bioinformatics/btl616
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The discovery of regulatory pathways, signal cascades, metabolic processes or disease models requires knowledge on individual relations like e.g. physical or regulatory interactions between genes and proteins. Most interactions mentioned in the free text of biomedical publications are not yet contained in structured databases. Results: We developed RelEx, an approach for relation extraction from free text. It is based on natural language preprocessing producing dependency parse trees and applying a small number of simple rules to these trees. We applied RelEx on a comprehensive set of one million MEDLINE abstracts dealing with gene and protein relations and extracted -150 000 relations with an estimated perfomance of both 80% precision and 80% recall. Availability: The used natural language preprocessing tools are free for use for academic research. Test sets and relation term lists are available from our website(http://www.bioifiImu.de/publications/RElEx/).
引用
收藏
页码:365 / 371
页数:7
相关论文
共 29 条
  • [1] [Anonymous], 2003, P 41 M ASS COMP LING
  • [2] Blaschke C, 2001, Genome Inform, V12, P123
  • [3] Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
  • [4] Data preparation and interannotator agreement: BioCreAtIvE task IB
    Colosimo, ME
    Morgan, AA
    Yeh, AS
    Colombe, JB
    Hirschman, L
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [5] Ding J, 2002, Pac Symp Biocomput, P326
  • [6] BioIE: extracting informative sentences from the biomedical literature
    Divoli, A
    Attwood, TK
    [J]. BIOINFORMATICS, 2005, 21 (09) : 2138 - 2139
  • [7] Gene and protein nomenclature in public databases
    Fundel, Katrin
    Zimmer, Ralf
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [8] Hakenberg J., 2005, Proceedings of Learning Language in Logic Workshop (LLL05) at the 22nd Int Conf on Machine Learning, P38
  • [9] ProMiner: rule-based protein and gene entity recognition
    Hanisch, D
    Fundel, K
    Mevissen, HT
    Zimmer, R
    Fluck, J
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [10] Literature mining and database annotation of protein phosphorylation using a rule-based system
    Hu, ZZ
    Narayanaswamy, M
    Ravikumar, KE
    Vijay-Shanker, K
    Wu, CH
    [J]. BIOINFORMATICS, 2005, 21 (11) : 2759 - 2765