A Hybrid Method for Persian Named Entity Recognition

被引:0
作者
Ahmadi, Farid [1 ]
Moradi, Hamed [1 ]
机构
[1] Urmia Univ Technol, Dept Informat Technol, Orumiyeh, Iran
来源
2015 7th Conference on Information and Knowledge Technology (IKT) | 2015年
关键词
Information Retrieval; Text Processing; Natural Language Processing; Languages and Systems; Named Entity Recognition; Hidden Markov Model; INFORMATION EXTRACTION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Named Entity Recognition (NER) is an information extraction subtask that attempts to recognize and categorize named entities in unstructured text into predefined categories such as the names of people, organizations, and locations. Recently, machine learning approaches, such as Hidden Markov Model (HMM) as well as hybrid methods, are frequently used to solve Name Entity Recognition. Since the absence of publicly available data sets for NER in Persian, as our knowledge does not exist any machine learning base Persian NER system. Because of HMM innate weaknesses, in this paper, we have used both Hidden Markov Model and rule-based method to recognize named entities in Persian texts. The combination of rule-based method and machine learning method results in a high accurate recognition. The proposed system in is machine learning section uses from HMM and Viterbi algorithms; and in its rule-based section employs a set of lexical resources and pattern bases for the recognition of named entities including the names of people, locations and organizations. During this study, we annotate our own training and testing data sets to use in the related phases. Our hybrid approach performs on Persian language with 89.73% precision, 82.44% recall, and 85.93% F-measure using an annotated test corpus including 32,606 tokens.
引用
收藏
页数:7
相关论文
共 18 条
[1]   An algorithm that learns what's in a name [J].
Bikel, DM ;
Schwartz, R ;
Weischedel, RM .
MACHINE LEARNING, 1999, 34 (1-3) :211-231
[2]  
Blunsom P., 2004, LECT NOTES, V15, P48
[3]  
Borthwick Andrew Eliot, 1999, A Maximum Entropy Approach to Named Entity Recognition
[4]  
Brill E, 1995, COMPUT LINGUIST, V21, P543
[5]  
Cohen W. W., 2004, P 10 ACM SIGKDD INT
[6]  
Dowman M, 2005, P 14 INT C WORLD WID, P225
[7]  
Grishman R., 1996, COLING
[8]  
Isozaki H, 2002, COLING 2002 19 INT C, P1, DOI [10.3115/1072228.1072282, DOI 10.3115/1072228.1072282]
[9]   Ontology-based fuzzy event extraction agent for Chinese e-news summarization [J].
Lee, CS ;
Chen, YJ ;
Jian, ZW .
EXPERT SYSTEMS WITH APPLICATIONS, 2003, 25 (03) :431-447
[10]   Named Entity Recognition: Fallacies, challenges and opportunities [J].
Marrero, Monica ;
Urbano, Julian ;
Sanchez-Cuadrado, Sonia ;
Morato, Jorge ;
Miguel Gomez-Berbis, Juan .
COMPUTER STANDARDS & INTERFACES, 2013, 35 (05) :482-489