What makes a gene name? Named entity recognition in the biomedical literature

被引:87
作者
Leser, U [1 ]
Hakenberg, J [1 ]
机构
[1] Humboldt Univ, Dept Comp Sci, D-12489 Berlin, Germany
关键词
text mining; knowledge management; information extraction; machine teaming; named entity recognition;
D O I
10.1093/bib/6.4.357
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The recognition of biomedical concepts in natural text (named entity recognition, NER) is a key technology for automatic or semi-automatic analysis of textual resources. Precise NER tools are a prerequisite for many applications working on text, such as information retrieval, information extraction or document classification. Over the past years, the problem has achieved considerable attention in the bioinformatics community and experience has shown that NER in the life sciences is a rather difficult problem. Several systems and algorithms have been devised and implemented. In this paper, the problems and resources in NER research are described, the principal algorithms underlying most systems sketched, and the current state-of-the-art in the field surveyed.
引用
收藏
页码:357 / 369
页数:13
相关论文
共 57 条
[1]   SaRAD: a simple and robust abbreviation dictionary [J].
Adar, E .
BIOINFORMATICS, 2004, 20 (04) :527-533
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
ANANIADOU S, 2004, J BIOMED INFORM, V37
[4]   Information technology to the rescue! [J].
Augen J. .
Nature Biotechnology, 2001, 19 (Suppl 6) :BE39-BE40
[5]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[6]  
BLASCHKE C, 2002, BRIEF BIOINFORM, V3, P1
[7]  
Brants T., 2000, P C APPL NAT LANG PR
[8]  
BRILL E, 1992, THIRD CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING, P152, DOI 10.3115/974499.974526
[9]   GAPSCORE:: finding gene and protein names one word at a time [J].
Chang, JT ;
Schütze, H ;
Altman, RB .
BIOINFORMATICS, 2004, 20 (02) :216-225
[10]  
CLEGG AB, 2005, P WORKSH SOFTW 43 AN