KIND: an Italian Multi-Domain Dataset for Named-Entity Recognition

被引:0
作者
Paccosi, Teresa [1 ,2 ]
Aprosio, Alessio Palmero [1 ]
机构
[1] Fdn Bruno Kessler, Via Sommarive 18, Trento, Italy
[2] Univ Trento, Corso Bettini 84, Rovereto, Italy
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
Named-entity recognition; Italian language; Natural Language Processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we present KIND, an Italian dataset for Named-entity recognition. It contains more than one million tokens with annotation covering three classes: person, location, and organization. The dataset (around 600K tokens) mostly contains manual gold annotations in three different domains (news, literature, and political discourses) and a semi-automatically annotated part. The multi-domain feature is the main strength of the present work, offering a resource which covers different styles and language uses, as well as the largest Italian NER dataset with manual gold annotations. It represents an important resource for the training of NER systems in Italian. Texts and annotations are freely downloadable from the Github repository.
引用
收藏
页码:501 / 507
页数:7
相关论文
共 50 条
[41]   An Improved Nested Named-Entity Recognition Model for Subject Recognition Task under Knowledge Base Question Answering [J].
Wang, Ziming ;
Xu, Xirong ;
Li, Xinzi ;
Li, Haochen ;
Wei, Xiaopeng ;
Huang, Degen .
APPLIED SCIENCES-BASEL, 2023, 13 (20)
[42]   Named Entity Recognition in the Domain of Geographical Subject [J].
Xu, Feifei ;
Li, Huiying ;
Li, Xuelian .
2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, :2229-2234
[43]   ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain [J].
Li, Xinyan ;
Li, Dongxu ;
Yang, Zhihao ;
Zhao, Hui ;
Cai, Wei ;
Lin, Xi .
NEURAL INFORMATION PROCESSING, ICONIP 2022, PT V, 2023, 1792 :361-372
[44]   SatelliteNER: An Effective Named Entity Recognition Model for the Satellite Domain [J].
Jafari, Omid ;
Nagarkar, Parth ;
Thatte, Bhagwan ;
Ingram, Carl .
PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KMIS), VOL 3, 2020, :100-107
[45]   AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition [J].
Pathak, Dhrubajyoti ;
Nandi, Sukumar ;
Sarmah, Priyankoo .
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, :6571-6577
[46]   LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text [J].
Luz de Araujo, Pedro Henrique ;
de Campos, Teofilo E. ;
de Oliveira, Renato R. R. ;
Stauffer, Matheus ;
Couto, Samuel ;
Bermejo, Paulo .
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 :313-323
[47]   A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition [J].
Ahmad, Hussain ;
Zeng, Qingyang ;
Wan, Jing .
IEEE ACCESS, 2025, 13 :100904-100919
[48]   Statistical dataset evaluation: A case study on named entity recognition [J].
Wang, Chengwen ;
Dong, Qingxiu ;
Wang, Xiaochen ;
Sui, Zhifang .
NATURAL LANGUAGE PROCESSING, 2025, 31 (01) :90-110
[49]   MetaMap Lite in Excel: Biomedical Named-Entity Recognition for Non-Technical Users [J].
Bhupatiraju, Ravi Teja ;
Fung, Kin Wah ;
Bodenreider, Olivier .
MEDINFO 2017: PRECISION HEALTHCARE THROUGH INFORMATICS, 2017, 245 :1252-1252
[50]   Deep Learning Transformer Architecture for Named-Entity Recognition on Low-Resourced Languages: State of the art results [J].
Hanslo, Ridewaan .
PROCEEDINGS OF THE 2022 17TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2022, :53-60