The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media

被引:1
作者
Batista, Filipe [1 ]
Figueira, Alvaro
机构
[1] INESC TEC, CRACS, Rua Campo Alegre 1021-1055, P-4169007 Porto, Portugal
来源
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017) | 2017年 / 10423卷
关键词
Named Entity Recognition; Social medria; Ensemble of NLP toolkits; Text-mining; Machine learning;
D O I
10.1007/978-3-319-65340-2_65
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we study the combined use of four different NLP toolkits-Stanford CoreNLP, GATE, OpenNLP and Twitter NLP tools-in the context of social media posts. Previous studies have shown performance comparisons between these tools, both on news and social media corporas. In this paper, we go further by trying to understand how differently these toolkits predict Named Entities, in terms of their precision and recall for three different entity types, and how they can complement each other in this task in order to achieve a combined performance superior to each individual one. Experiments on two publicly available datasets from the workshops WNUT-2015 and #MSM2013 show that using an ensemble of toolkits can improve the recognition of specific entity types - up to 10.62% for the entity type Person, 1.97% for the type Location and 1.31% for the type Organization, depending on the dataset and the criteria used for the voting. Our results also showed improvements of 3.76% and 1.69%, in each dataset respectively, on the average performance of the three entity types.
引用
收藏
页码:803 / 814
页数:12
相关论文
共 16 条
  • [1] [Anonymous], 2011, P 2011 C EMPIRICAL M
  • [2] [Anonymous], P WORKSH NOIS US GEN
  • [3] [Anonymous], OASICS OPENACCESS SE
  • [4] Atdag Samet, 2013, 2nd International Conference on Systems and Computer Science (ICSCS), P228, DOI 10.1109/IcConSCS.2013.6632052
  • [5] Bontcheva K., 2013, P INT C REC ADV NAT
  • [6] Cano Basave A.E., 2013, Making sense of microposts (# msm2013) concept extraction challenge
  • [7] Clark Alexander, 2013, The Handbook of Computational Linguistics and Natural Language Processing
  • [8] An Approach to Relevancy Detection: contributions to the automatic detection of relevance in social networks
    Figueira, Alvaro
    Sandim, Miguel
    Fortuna, Paula
    [J]. NEW ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, 2016, 444 : 89 - 99
  • [9] Jiang R., 2016, P 6 NAMED ENTITY WOR, P21, DOI DOI 10.18653/V1/W16-2703
  • [10] Laboreiro G., 2010, Proceedings of the fourth workshop on Analytics for noisy unstructured text data, AND '10, P81