Character Gazetteer for Named Entity Recognition with Linear Matching Complexity

被引:0
作者
Dlugolinsky, Stefan [1 ]
Nguyen, Giang [1 ]
Laclavik, Michal [1 ]
Seleng, Martin [1 ]
机构
[1] Slovak Acad Sci, Inst Informat, Dubravska Cesta 9, Bratislava 84507, Slovakia
来源
2013 THIRD WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES (WICT) | 2013年
关键词
gazetteer; named entity recognition; natural language processing; text processing; tokenization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A large amount of unstructured data is produced daily through numerous media around us. Despite that computer systems are becoming more powerful, even the commodity hardware, processing of such data and gaining useful information in time efficient manner remains a problem. One of the domains in unstructured data processing is Natural Language Processing (NLP). NLP covers areas like information extraction, machine translation, word sense disambiguation, automated question answering, etc. All of these areas require fast and precise Named Entity Recognition (NER), which is not a trivial task because of the processed data size and heterogeneity. Our effort in this research area is to provide fast tokenization and precise NER with linear complexity. In this paper, we present a character gazetteer with linear tokenization as well as NER and compare its two tree data structure representations; i.e. muItiway tree implemented by hash maps and first child-next sibling binary tree. Our measurements shows that one outperforms the other in processing time, while the other outperforms it in memory consumption efficiency.
引用
收藏
页码:361 / 365
页数:5
相关论文
共 10 条
[1]  
[Anonymous], 2011, Text Processing with GATE (Version 6)
[2]  
[Anonymous], 2001, P 2001 REC ADV NAT L
[3]  
Chiticariu L., 2010, P 2010 C EMP METH NA, P1002, DOI DOI 10.5555/1870658.1870756
[4]  
Dlugolinsky Stefan, 2013, MAKING SENSE MICROPO, P21
[5]  
Kozareva Z., 2006, Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, P15
[6]  
Laclavík M, 2009, COMPUT INFORM, V28, P555
[7]  
Liu Xiaohua, 2011, ACL HLT 2011 P 49 AN, P359
[8]  
Nadeau D, 2006, LECT NOTES ARTIF INT, V4013, P266, DOI 10.1007/11766247_23
[9]  
Nguyen G., 2013, 8 WORKSH INT KNOWL O
[10]  
Sang E., 2003, Introduction to the CoNLL-2003 shared task: language-independent named entity recognition