Named entity recognition based on conditional random fields

被引:21
作者
Song, Shengli [1 ]
Zhang, Nan [2 ]
Huang, Haitao [2 ]
机构
[1] Xidian Univ, Inst Software Engn, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Shaanxi, Peoples R China
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2019年 / 22卷 / Suppl 3期
关键词
Named entity recognition; Conditional random fields; Graininess; INFORMATION; SYSTEM;
D O I
10.1007/s10586-017-1146-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named entity recognition (NER) is one of the fundamental problems in many natural language processing applications and the study on NER has great significance. Combining words segmentation and parts of speech analysis, the paper proposes a new NER method based on conditional random fields considering the graininess of candidate entities. The recognition granularity can be divided into two levels: word-based and character-based. We use segmented text to extract characteristics according to the characteristic templates which had been trained in the training phase, and then calculate P(y vertical bar x) to get the best result from the input sequence. The paper valuates the algorithm for different graininess on large-scale corpus experimentally, and the results show that this method has high research value and feasibility.
引用
收藏
页码:S5195 / S5206
页数:12
相关论文
共 26 条
[1]  
[Anonymous], P 5 SIGHAN WORKSH CH
[2]   STATISTICAL INFERENCE FOR PROBABILISTIC FUNCTIONS OF FINITE STATE MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (06) :1554-&
[3]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[4]  
Bhargava Rupal, 2016, FACILITIES, V23, P10
[5]  
Cho Kyunghyun, 2014, C EMPIRICAL METHODS, P1724
[6]   Natural language processing [J].
Chowdhury, GG .
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 :51-89
[7]  
Fu R., 2011, P 5 INT JOINT C NAT, P264
[8]  
Goldber D. E., 1988, Machine Learning, V3, P95, DOI 10.1023/A:1022602019183
[9]  
Jones R., 2003, P EUR C MACH LEARN E, V77, P257
[10]  
Joseph K, 2010, INT C MACH LEARN ICM