KIND: an Italian Multi-Domain Dataset for Named-Entity Recognition

被引:0
作者
Paccosi, Teresa [1 ,2 ]
Aprosio, Alessio Palmero [1 ]
机构
[1] Fdn Bruno Kessler, Via Sommarive 18, Trento, Italy
[2] Univ Trento, Corso Bettini 84, Rovereto, Italy
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
Named-entity recognition; Italian language; Natural Language Processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we present KIND, an Italian dataset for Named-entity recognition. It contains more than one million tokens with annotation covering three classes: person, location, and organization. The dataset (around 600K tokens) mostly contains manual gold annotations in three different domains (news, literature, and political discourses) and a semi-automatically annotated part. The multi-domain feature is the main strength of the present work, offering a resource which covers different styles and language uses, as well as the largest Italian NER dataset with manual gold annotations. It represents an important resource for the training of NER systems in Italian. Texts and annotations are freely downloadable from the Github repository.
引用
收藏
页码:501 / 507
页数:7
相关论文
共 50 条
[31]   An Arabic Dataset for Disease Named Entity Recognition with Multi-Annotation Schemes [J].
Alshammari, Nasser ;
Alanazi, Saad .
DATA, 2020, 5 (03) :1-8
[32]   Ontology Extraction from Software Requirements Using Named-Entity Recognition [J].
Kocerka, Jerzy ;
Krzeslak, Michal ;
Galuszka, Adam .
ADVANCES IN SCIENCE AND TECHNOLOGY-RESEARCH JOURNAL, 2022, 16 (03) :207-212
[33]   A Two-stage Approach of Named-Entity Recognition for Crime Analysis [J].
Das, Priyanka ;
Das, Asit Kumar .
2017 8TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2017,
[34]   Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record [J].
Ruch, P ;
Baud, R ;
Geissbühler, A .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2003, 29 (1-2) :169-184
[35]   SciCN: A Scientific Dataset for Chinese Named Entity Recognition [J].
Yang, Jing ;
Ji, Bin ;
Li, Shasha ;
Ma, Jun ;
Yu, Jie .
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (03) :4303-4315
[36]   HiNER: A Large Hindi Named Entity Recognition Dataset [J].
Murthy, Rudra ;
Bhattacharjee, Pallab ;
Sharnagat, Rahul ;
Khatri, Jyotsana ;
Kanojia, Diptesh ;
Bhattacharyya, Pushpak .
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, :4467-4476
[37]   ViMedNER: A Medical Named Entity Recognition Dataset for Vietnamese [J].
Duong, Pham Van ;
Trinh, Tien-Dat ;
Nguyen, Minh-Tien ;
Vu, Huy-The ;
Pham, Minh-Chuan ;
Tuan, Tran Manh ;
Son, Le Hoang .
EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (04)
[38]   Named-Entity Recognition on Indonesian Tweets using Bidirectional LSTM-CRF [J].
Wintaka, Deni Cahya ;
Bijaksana, Moch Arif ;
Asror, Ibnu .
4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY, 2019, 157 :221-228
[39]   SiNER: A Large Dataset for Sindhi Named Entity Recognition [J].
Ali, Wazir ;
Lu, Junyu ;
Xu, Zenglin .
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, :2953-2961
[40]   A Dataset of German Legal Documents for Named Entity Recognition [J].
Leitner, Elena ;
Rehm, Georg ;
Moreno-Schneider, Julian .
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, :4478-4485