Fine-Grained Named Entity Recognition for Sinhala

被引:0
作者
Azeez, Rameela [1 ]
Ranathunga, Surangika [1 ]
机构
[1] Univ Moratuwa, Dept Comp Sci & Engn, Katubedda 10400, Sri Lanka
来源
MERCON 2020: 6TH INTERNATIONAL MULTIDISCIPLINARY MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) | 2020年
关键词
named entity recognition; sinhala; named entity; conditional random fields;
D O I
10.1109/mercon50084.2020.9185296
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
For English, Named Entity Recognition (NER) is more or less a solved problem. However, for low-resourced and morphologically rich languages such as Sinhala, minimal research has been done. In this paper, we present a novel fine-grained Named Entity (NE) tag set and an NE annotated Sinhala corpus of 70k word tokens. We trained a custom NER model for Sinhala based on Conditional Random Fields (CRF). Despite the low-resourced setting, this NER model could achieve an micro-averaged F1 score of 84.8.
引用
收藏
页码:295 / 300
页数:6
相关论文
共 24 条
  • [1] Akbik Alan, 2018, P 27 INT C COMPUTATI, P1638
  • [2] Curran James R., 2003, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume, P164, DOI DOI 10.3115/1119176.1119200
  • [3] Dahanayaka JK, 2014, INT CONF ADV ICT, P215, DOI 10.1109/ICTER.2014.7083904
  • [4] Dai X, 2018, P ACL 2018 STUDENT R, P37
  • [5] De Meulder F., 2003, P 7 C NATURAL LANGUA, DOI DOI 10.3115/1119176.1119195
  • [6] Fernando A., 2020, INT C ADV ICT EM REG
  • [7] Fernando S, 2018, 2018 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) 4TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, P96, DOI 10.1109/MERCon.2018.8421997
  • [8] Fernando Sandareka, 2016, P 6 WORKSHOP S SE AS, P173
  • [9] Klie J.C., 2018, P 27 INT C COMP LING, P5
  • [10] Li JY, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5794, DOI 10.1109/ICASSP.2018.8462017