Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers

被引:9
作者
Ehrmann, Maud [1 ]
Romanello, Matteo [1 ]
Bircher, Stefan [2 ]
Clematide, Simon [2 ]
机构
[1] Ecole Polytech Fed Lausanne, Digital Humanities Lab, Lausanne, Switzerland
[2] Univ Zurich, Inst Computat Linguist, Zurich, Switzerland
来源
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2020, PT II | 2020年 / 12036卷
基金
瑞士国家科学基金会;
关键词
Named entity processing; Text understanding; Information extraction; Historical newspapers; Digital Humanities;
D O I
10.1007/978-3-030-45442-5_68
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. If NE processing tools are increasingly being used in the context of historical documents, performance values are below the ones on contemporary data and are hardly comparable. In this context, this paper introduces the CLEF 2020 Evaluation Lab HIPE (Identifying Historical People, Places and other Entities) on named entity recognition and linking on diachronic historical newspaper material in French, German and English. Our objective is threefold: strengthening the robustness of existing approaches on non-standard inputs, enabling performance comparison of NE processing on historical texts, and, in the long run, fostering efficient semantic indexing of historical documents in order to support scholarship on digital cultural heritage collections.
引用
收藏
页码:524 / 532
页数:9
相关论文
共 31 条
[1]  
Akbik A., 2018, P 27 INT C COMPUTATI, P1638
[2]  
Bollmann M, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3885
[3]  
Borin Lars., 2007, Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007), P1
[4]  
Chiu JP., 2016, Trans. Assoc. Comput. Linguistics, V4, P357, DOI [DOI 10.1162/TACL_A_00104, 10.1162/tacla00104, DOI 10.1162/TACLA00104]
[5]  
Dinarelli M., 2012, P 8 INT C LANG RES E
[6]  
Ehrmann M., 2020, Fluckiger: Impresso Named Entity Annotation Guidelines, DOI [10.5281/zenodo.3604227, DOI 10.5281/ZENODO.3604227]
[7]  
Ehrmann Maud, 2016, P 13 C NAT LANG PROC, P97
[8]  
Galibert O, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3995
[9]  
Galibert O, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3126
[10]  
Goulart R.R.V., 2011, J. Braz. Comput. Soc., V17, P103, DOI DOI 10.1007/S13173-011-0031-9