Exploring the boundaries: gene and protein identification in biomedical text

被引:49
作者
Finkel, J
Dingare, S
Manning, CD [1 ]
Nissim, M
Alex, B
Grover, C
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Univ Edinburgh, Inst Communicating & Collaborat Syst, Edinburgh EH8 9YL, Midlothian, Scotland
关键词
External Resource; Name Entity Recognition; Biomedical Domain; Biomedical Text; GENIA Corpus;
D O I
10.1186/1471-2105-6-S1-S5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. Conclusion: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.
引用
收藏
页数:9
相关论文
共 26 条
[1]  
ARONSON AR, 2000, 2000 AMIA ANN FALL S, P17
[2]  
BORTHWICK A.E., 1999, MAXIMUM ENTROPY APPR
[3]  
BRANTS T, 2000, ANLP, V6, P224
[4]  
Collins M, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P489
[5]  
Curran JamesR., 2003, P 7 C NATURAL LANGUA, V4. -, P164, DOI DOI 10.3115/1119176.1119200
[6]   Inducing features of random fields [J].
DellaPietra, S ;
DellaPietra, V ;
Lafferty, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393
[7]  
DEMETRIOU G, 2003, P 3 M SPEC INT GROUP
[8]  
FINKEL J, 2004, P INT JOINT WORKSH N
[9]  
GREFENSTETTE G, 1999, P ASLIB 99 TRANSLATI, V21
[10]   Rutabaga by any other name: extracting biological names [J].
Hirschman, L ;
Morgan, AA ;
Yeh, AS .
JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (04) :247-259