Government Domain Named Entity Recognition for South African Languages

被引:0
作者
Eiselen, Roald [1 ]
机构
[1] North West Univ, Ctr Text Technol, Potchefstroom Campus, Potchefstroom, South Africa
来源
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年
关键词
Language resource development; South African languages; named entity recognition;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper describes the named entity language resources developed as part of a development project for the South African languages. The development efforts focused on creating protocols and annotated data sets with at least 15,000 annotated named entity tokens for ten of the official South African languages. The description of the protocols and annotated data sets provide an overview of the problems encountered during the annotation of the data sets. Based on these annotated data sets, CRF named entity recognition systems are developed that leverage existing linguistic resources. The newly created named entity recognisers are evaluated, with F-scores of between 0.64 and 0.77, and error analysis is performed to identify possible avenues for improving the quality of the systems.
引用
收藏
页码:3344 / 3348
页数:5
相关论文
共 20 条
  • [1] [Anonymous], 2004, PROC LREC
  • [2] [Anonymous], P DEM 13 C EUR CHAPT
  • [3] [Anonymous], 2005, CRF++: Yet another CRF toolkit
  • [4] Babych B., 2003, P 7 INT EAMT WORKSHO
  • [5] CTexT, 2015, LING ANN REG ASS LAR
  • [6] Das A., 2014, ABS14098 CORR
  • [7] Eiselen R, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3698
  • [8] Fourie W., 2014, P PATT REC ASS S AFR
  • [9] Grishman Ralph, 1996, COLING 96, P466, DOI [DOI 10.3115/992628.992709, 10.1162/neco.1997.9.8.1735]
  • [10] Grishman Ralph., 2003, The Handbook of Computational Linguistics and Natural Language Processing, P515