A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters

被引:15
|
作者
Boros, Emanuela [1 ]
Romero, Veronica [2 ]
Maarand, Martin [1 ]
Zenklova, Katerina [3 ]
Kreckova, Jitka [3 ]
Vidal, Enrique [2 ]
Stutzmann, Dominique [4 ]
Kermorvant, Christopher [1 ]
机构
[1] TEKLIA, Paris, France
[2] Univ Politecn Valencia, PRHLT Res Ctr, Valencia, Spain
[3] Narodni Arch, Prague, Czech Republic
[4] IRHT CNRS, Paris, France
来源
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020) | 2020年
关键词
Named entity recognition; Handwritten Text Recognition; historical document processing; multilingualism;
D O I
10.1109/ICFHR2020.2020.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a new corpus of multilingual medieval handwritten charter images, annotated with full transcription and named entities. The corpus is used to compare two approaches for named entity recognition in historical document images in several languages: on the one hand, a sequential approach, more commonly used, that sequentially applies handwritten text recognition (HTR) and named entity recognition (NER), on the other hand, a combined approach that simultaneously transcribes the image text line and extracts the entities. Experiments conducted on the charter corpus in Latin, early new high German and old Czech for name, date and location recognition demonstrate a superior performance of the combined approach.
引用
收藏
页码:79 / 84
页数:6
相关论文
共 50 条
  • [1] Evaluation of Named Entity Recognition in Handwritten Documents
    Villanova-Aparisi, David
    Martinez-Hinarejos, Carlos-D
    Romero, Veronica
    Pastor-Gadea, Moises
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 568 - 582
  • [2] A Finnish news corpus for named entity recognition
    Teemu Ruokolainen
    Pekka Kauppinen
    Miikka Silfverberg
    Krister Lindén
    Language Resources and Evaluation, 2020, 54 : 247 - 272
  • [3] A Twitter Corpus for Named Entity Recognition in Turkish
    Carik, Buse
    Yeniterzi, Reyyan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4546 - 4551
  • [4] A Finnish news corpus for named entity recognition
    Ruokolainen, Teemu
    Kauppinen, Pekka
    Silfverberg, Miikka
    Linden, Krister
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
  • [5] Named Entity Recognition from Unstructured Handwritten Document Images
    Adak, Chandranath
    Chaudhuri, Bidyut B.
    Blumenstein, Michael
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 375 - 380
  • [6] Construction of a Geological Fault Corpus and Named Entity Recognition
    Wang, Huainuo
    Niu, Ruiqing
    Han, Yongyao
    Deng, Qinglu
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [7] An Open Corpus for Named Entity Recognition in Historic Newspapers
    Neudecker, Clemens
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4348 - 4352
  • [8] MTNER: A Corpus for Mongolian Tourism Named Entity Recognition
    Cheng, Xiao
    Wang, Weihua
    Bao, Feilong
    Gao, Guanglai
    MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 11 - 23
  • [9] Named Entity Recognition in Turkish: Approaches and Issues
    Kucuk, Dogan
    Arici, Nursal
    Kucuk, Dilek
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 176 - 181
  • [10] A Broad-coverage Corpus for Finnish Named Entity Recognition
    Luoma, Jouni
    Oinonen, Miika
    Pyykonen, Maria
    Laippala, Veronika
    Pyysalo, Sampo
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4615 - 4624