Biomedical named entity recognition using generalized expectation criteria

被引:0
作者
Lin Yao
Chengjie Sun
Yan Wu
Xiaolong Wang
Xuan Wang
机构
[1] Harbin Institute of Technology Shenzhen Graduate School,Department of Computer Science
[2] Harbin Institute of Technology,School of Computer Science and Technology
[3] Harbin Institute of Technology,School of Software
来源
International Journal of Machine Learning and Cybernetics | 2011年 / 2卷
关键词
Conditional random field; General expectation; Latent Dirichlet allocation; Biomedical named entity recognition; Semi-supervised learning;
D O I
暂无
中图分类号
学科分类号
摘要
It is difficult to apply machine learning to a domain which is short of labeled training data, such as biomedical named entity recognition (NER) which remains a challenging task because of its extraordinary complex nomenclature. In this paper, we proposed a semi-supervised method which can train condition random field (CRF) models using generalized expectation (GE) criteria to solve biomedical named entity recognition problem. In the proposed method, instead of “instance” labeling, the “feature” labeling is applied to get the training data which can save lots of labeling time. Latent Dirichlet Allocation (LDA) model was involved to choose the features for labeling. Experiment results show that the proposed method can dramatically improve the performance of biomedical NER through incorporating unlabeled data by feature labeling.
引用
收藏
页码:235 / 243
页数:8
相关论文
共 37 条
[1]  
Dai H(2010)New challenges for biological text-mining in the next decade J Comput Sci Technol 25 169-179
[2]  
Nadeau D(2007)A survey of named entity recognition and classification Linguisticae Investigationes 30 3-26
[3]  
Sekine S(2010)Full-class set classification using the Hungarian algorithm Int J Mach Learn Cybern 1 53-61
[4]  
Kuncheva LI(2008)Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge Genome Biol 9 1-298
[5]  
Krallinger M(2008)Text processing through web services: calling Whatizit Bioinformatics 24 296-911
[6]  
Rebholz-Schuhmann D(2009)Feature selection techniques for maximum entropy based biomedical named entity recognition J Biomed Inform 42 905-338
[7]  
Saha SK(2009)Two-phase biomedical named entity recognition using CRFs Comput Biol Chem 33 334-447
[8]  
Sarkar S(2004)Biomedical named entity recognition using two-phase model based on SVMs J Biomed Inform 37 436-40
[9]  
Mitra PP(2006)Maximum margin semi-supervised learning for structured variables Adv Neural Inf Process Syst 18 33-25
[10]  
Li L(2010)Margin-based active learning for structured predictions Int J Mach Learn Cybern 1 3-984