Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents

被引:5
作者
Ehrmann, Maud [1 ]
Romanello, Matteo [2 ]
Najem-Meyer, Sven [1 ]
Doucet, Antoine [3 ]
Clematide, Simon [4 ]
机构
[1] EPFL, Digital Humanities Lab, Vaud, Switzerland
[2] Univ Lausanne, Lausanne, Switzerland
[3] Univ La Rochelle, La Rochelle, France
[4] Univ Zurich, Dept Computat Linguist, Zurich, Switzerland
来源
EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2022) | 2022年 / 13390卷
关键词
Named entity recognition and classification; Entity linking; Historical texts; Information extraction; Digitised newspapers; Digital humanities;
D O I
10.1007/978-3-031-13643-6_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an overview of the second edition of HIPE (Identifying Historical People, Places and other Entities), a shared task on named entity recognition and linking in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, HIPE-2022 confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. This shared task is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of HIPE-2022, run as an evaluation lab of the CLEF 2022 conference, is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets. Tasks, corpora, and results of participating teams are presented.
引用
收藏
页码:423 / 446
页数:24
相关论文
共 36 条
  • [1] Acs J, 2021, 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), P2284
  • [2] Generalisation in named entity recognition: A quantitative analysis
    Augenstein, Isabelle
    Derczynski, Leon
    Bontcheva, Kalina
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 44 : 61 - 83
  • [3] Beryozkin G, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P140
  • [4] Boros E., 2020, P 24 C COMP NAT LANG, P431, DOI [DOI 10.18653/V1/2020.CONLL-1.3, DOI 10.18653/V1/2020.CONLL-1.35]
  • [5] Coll Ardanuy M., 2021, DATASET TOPONYM RESO, DOI [10.23636/b1c4-py78, DOI 10.23636/B1C4-PY78]
  • [6] Conneau Alexis, 2019, CoRR abs/1911.02116
  • [7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [8] Ehrmann M, 2016, P 13 C NAT LANG PROC, P97
  • [9] Ehrmann M., 2020, Impresso named entity annotation guidelines, DOI DOI 10.5281/ZENODO.3604227
  • [10] Ehrmann M., 2022, WORKING NOTES CLEF 2