A comparative study for biomedical named entity recognition

被引:40
作者
Wang, Xu [1 ]
Yang, Chen [2 ]
Guan, Renchu [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, 2699 Qianjin St, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Coll Earth Sci, 2699 Qianjin St, Changchun 130012, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Biomedical named entity recognition; Machine learning; HMM; CRF; DICTIONARY; TEXT;
D O I
10.1007/s13042-015-0426-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With high-throughput technologies applied in biomedical research, the quantity of biomedical literatures grows exponentially. It becomes more and more important to quickly as well as accurately extract knowledge from manuscripts, especially in the era of big data. Named entity recognition (NER), aiming at identifying chunks of text that refers to specific entities, is essentially the initial step for information extraction. In this paper, we will review the three models of biomedical NER and two famous machine learning methods, Hidden Markov Model and Conditional Random Fields, which have been widely applied in bioinformatics. Based on these two methods, six excellent biomedical NER tools are compared in terms of programming language, feature sets, underlying mathematical methods, post-processing techniques and flowcharts. Experimental results of these tools against two widely used corpora, GENETAG and JNLPBA, are conducted. The comparison varies from different entity types to the overall performance. Furthermore, we put forward suggestions about the selection of Bio-NER tools for different applications.
引用
收藏
页码:373 / 382
页数:10
相关论文
共 39 条
[1]  
[Anonymous], 2004, P 42 ANN M ASS COMPU
[2]  
[Anonymous], 2004, Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), DOI 10.3115/1567594.1567618
[3]  
[Anonymous], 2002, MALLET: A machine learning for language toolkit
[4]  
[Anonymous], 1998, Genome Inform Ser Workshop Genome Inform
[5]  
[Anonymous], SURVEY NAMED ENTITY
[6]  
[Anonymous], 2004, PROC INT JOINT WORKS
[7]  
[Anonymous], P HLT NAACL
[8]  
[Anonymous], 2000, P 18 C COMP LING COL, DOI [DOI 10.3115/990820, DOI 10.3115/990820.990850]
[9]  
[Anonymous], 2007, P EMP METH NAT LANG
[10]  
Baldwin B., 2003, LingPipe