KIND: an Italian Multi-Domain Dataset for Named-Entity Recognition

被引:0
|
作者
Paccosi, Teresa [1 ,2 ]
Aprosio, Alessio Palmero [1 ]
机构
[1] Fdn Bruno Kessler, Via Sommarive 18, Trento, Italy
[2] Univ Trento, Corso Bettini 84, Rovereto, Italy
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
Named-entity recognition; Italian language; Natural Language Processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we present KIND, an Italian dataset for Named-entity recognition. It contains more than one million tokens with annotation covering three classes: person, location, and organization. The dataset (around 600K tokens) mostly contains manual gold annotations in three different domains (news, literature, and political discourses) and a semi-automatically annotated part. The multi-domain feature is the main strength of the present work, offering a resource which covers different styles and language uses, as well as the largest Italian NER dataset with manual gold annotations. It represents an important resource for the training of NER systems in Italian. Texts and annotations are freely downloadable from the Github repository.
引用
收藏
页码:501 / 507
页数:7
相关论文
共 50 条
  • [1] Multi-domain evaluation framework for named entity recognition tools
    Abdallah, Zahraa S.
    Carman, Mark
    Haffari, Gholamreza
    COMPUTER SPEECH AND LANGUAGE, 2017, 43 : 34 - 55
  • [2] Towards a Unified Multi-Domain Multilingual Named Entity Recognition Model
    Kulkarni, Mayank
    Preotiuc-Pietro, Daniel
    Radhakrishnan, Karthik
    Winata, Genta Indra
    Wu, Shijie
    Xie, Lingjue
    Yang, Shaohua
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2210 - 2219
  • [3] TeluguNER: Leveraging Multi-Domain Named Entity Recognition with Deep Transformers
    Duggenpudi, Suma Reddy
    Oota, Subba Reddy
    Marreddy, Mounika
    Mamidi, Radhika
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 262 - 272
  • [4] Named-entity recognition for Polish with SProUT
    Piskorski, J
    INTELLIGENT MEDIA TECHNOLOGY FOR COMMUNICATIVE INTELLIGENCE, 2005, 3490 : 122 - 133
  • [5] Related Work in Named-Entity Recognition
    不详
    IEEE INTELLIGENT SYSTEMS, 2015, 30 (06) : 52 - 52
  • [6] Rembrandt - a named-entity recognition framework
    Cardoso, Nuno
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1240 - 1243
  • [7] Multi-domain adaptation for named entity recognition with multi-aspect relevance learning
    Li, Jiarui
    Liu, Jian
    Chen, Yufeng
    Xu, Jinan
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 803 - 818
  • [8] Multi-domain adaptation for named entity recognition with multi-aspect relevance learning
    Jiarui Li
    Jian Liu
    Yufeng Chen
    Jinan Xu
    Language Resources and Evaluation, 2023, 57 : 803 - 818
  • [9] Efficacy of Arabic Named-Entity Recognition
    Al-Shoukry, Suhad
    Omar, Nazlia
    5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, 2015, : 506 - 510
  • [10] Creating a Dataset for Named Entity Recognition in the Archaeology Domain
    Brandsen, Alex
    Verberne, Suzan
    Wansleeben, Milco
    Lambers, Karsten
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4573 - 4577