A comparative study for biomedical named entity recognition

被引:0
作者
Xu Wang
Chen Yang
Renchu Guan
机构
[1] Jilin University,College of Computer Science and Technology
[2] Jilin University,College of Earth Sciences
来源
International Journal of Machine Learning and Cybernetics | 2018年 / 9卷
关键词
Biomedical named entity recognition; Machine learning; HMM; CRF;
D O I
暂无
中图分类号
学科分类号
摘要
With high-throughput technologies applied in biomedical research, the quantity of biomedical literatures grows exponentially. It becomes more and more important to quickly as well as accurately extract knowledge from manuscripts, especially in the era of big data. Named entity recognition (NER), aiming at identifying chunks of text that refers to specific entities, is essentially the initial step for information extraction. In this paper, we will review the three models of biomedical NER and two famous machine learning methods, Hidden Markov Model and Conditional Random Fields, which have been widely applied in bioinformatics. Based on these two methods, six excellent biomedical NER tools are compared in terms of programming language, feature sets, underlying mathematical methods, post-processing techniques and flowcharts. Experimental results of these tools against two widely used corpora, GENETAG and JNLPBA, are conducted. The comparison varies from different entity types to the overall performance. Furthermore, we put forward suggestions about the selection of Bio-NER tools for different applications.
引用
收藏
页码:373 / 382
页数:9
相关论文
共 37 条
  • [1] Chiang J-H(2003)MeKE: discovering the functions of gene products from biomedical literature via sentence alignment Bioinformatics 19 1417-1422
  • [2] Yu H-C(2004)Recognizing names in biomedical texts: a machine learning approach Bioinformatics 20 1178-1190
  • [3] Zhou G(2002)Creating an online dictionary of abbreviations from MEDLINE J Am Med Inform Assoc JAMIA 9 612-620
  • [4] Zhang J(2008)Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature Comput Biol Chem 32 287-291
  • [5] Su J(1998)Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction Genome Inform Workshop Genome Inform 9 72-80
  • [6] Chang JT(2006)NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition BMC Bioinform 7 S11-3192
  • [7] Schütze H(2005)ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text Bioinformatics 21 3191-105
  • [8] Altman RB(2013)Gimli: open source and high-performance biomedical name recognition BMC Bioinform 14 54-85
  • [9] Yang Z(2005)GENETAG: a tagged corpus for gene/protein named entity recognition BMC bioinform 6 S3-505
  • [10] Lin H(2006)BioThesaurus: a web-based thesaurus of protein and gene names Bioinformatics 22 103-undefined