Towards Bangla Named Entity Recognition

被引:0
作者
Chowdhury, Shammur Absar [1 ]
Alam, Firoj [2 ]
Khan, Naira [3 ]
机构
[1] Univ Trento, Trento, Italy
[2] QCRI, Doha, Qatar
[3] Dhaka Univ, Dhaka, Bangladesh
来源
2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT) | 2018年
关键词
Bangla; Named Entity Recognition; Sequence Labeling; CRF; Neural Network; LSTM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named Entity Recognition is one of the fundamental problems for Information Extraction and the task is to find the mentioned entities in text. Over the years there has been significant progress in Named Entity Recognition (NER) research for resource-rich languages such as English, Chinese, and Italian. Although, there are a number of studies for Bangla NER, however, most of these studies are conducted almost a decade ago and were focused on a single geographical location (i.e., India). Therefore, in this paper, we present a corpus annotated with seven named entities with a particular focus on Bangladeshi Bangla. It is a part of the development of the Bangla Content Annotation Bank (B-CAB). We also present baseline results, which can be useful for future research. For the baseline results, we employed word-level, POS, gazetteers and contextual features along with Conditional Random Fields (CRFs). Our study also includes the exploration of deep neural networks. Additionally, we investigated another large corpus from a different geographical location (i.e., India) and concluded on the importance of geographic-based NER for a language.
引用
收藏
页数:7
相关论文
共 47 条
[1]  
Alam F, 2016, INT CONF COMPUT INFO, P377, DOI 10.1109/ICCITECHN.2016.7860227
[2]  
Alam Firoj, 2011, C HUMAN LANGUAGE TEC, P154
[3]  
[Anonymous], 2016, P NAACL HLT
[4]  
[Anonymous], 2016, P 54 ANN M ASS COMPU, DOI DOI 10.18653/V1/P16-1101
[5]  
[Anonymous], 2008, P IJCNLP 08 WORKSH N
[6]  
[Anonymous], WORKING NOTES EVALIT
[7]  
[Anonymous], 2003, P 8 INT C PARS TECHN
[8]  
[Anonymous], CLIC IT
[9]  
[Anonymous], 2014, 15 ANN C INT SPEECH
[10]  
[Anonymous], P IJCNLP 08 WORKSH N