A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters

被引:15
作者
Boros, Emanuela [1 ]
Romero, Veronica [2 ]
Maarand, Martin [1 ]
Zenklova, Katerina [3 ]
Kreckova, Jitka [3 ]
Vidal, Enrique [2 ]
Stutzmann, Dominique [4 ]
Kermorvant, Christopher [1 ]
机构
[1] TEKLIA, Paris, France
[2] Univ Politecn Valencia, PRHLT Res Ctr, Valencia, Spain
[3] Narodni Arch, Prague, Czech Republic
[4] IRHT CNRS, Paris, France
来源
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020) | 2020年
关键词
Named entity recognition; Handwritten Text Recognition; historical document processing; multilingualism;
D O I
10.1109/ICFHR2020.2020.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a new corpus of multilingual medieval handwritten charter images, annotated with full transcription and named entities. The corpus is used to compare two approaches for named entity recognition in historical document images in several languages: on the one hand, a sequential approach, more commonly used, that sequentially applies handwritten text recognition (HTR) and named entity recognition (NER), on the other hand, a combined approach that simultaneously transcribes the image text line and extracts the entities. Experiments conducted on the charter corpus in Latin, early new high German and old Czech for name, date and location recognition demonstrate a superior performance of the combined approach.
引用
收藏
页码:79 / 84
页数:6
相关论文
共 50 条
  • [21] Efficient combined approach for named entity recognition in spoken language
    Zidouni, Azeddine
    Rosset, Sophie
    Glotin, Herve
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1293 - +
  • [22] Distantly Supervised Named Entity Recognition Combined with Prototypical Networks
    Luo S.
    Lin Z.
    Pan L.
    Wu Z.
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2023, 43 (04): : 410 - 416
  • [23] Named entity recognition in medical domain combined with knowledge graph
    Jin Z.
    He X.
    Yue S.
    Xiong Y.
    Luo J.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2023, 55 (05): : 50 - 58
  • [24] A Named Entity Recognition Corpus for Vietnamese Biomedical Texts to Support Tuberculosis Treatment
    Phan, Uyen T. P.
    Nguyen, Phuong N. V.
    Nguyen, Nhung T. H.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3601 - 3609
  • [25] DrugSemantics: A corpus for Named Entity. Recognition in Spanish Summaries of Product Characteristics
    Moreno, Isabel
    Boldrini, Ester
    Moreda, Paloma
    Teresa Roma-Ferri, M.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 72 : 8 - 22
  • [26] Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus
    Suriyachay, Kitiya
    Sornlertlamvanich, Virach
    2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 30 - 35
  • [27] UlyssesNER-Br: A Corpus of Brazilian Legislative Documents for Named Entity Recognition
    Albuquerque, Hidelberg O.
    Costa, Rosimeire
    Silvestre, Gabriel
    Souza, Ellen
    da Silva, Nadia F. F.
    Vitorio, Douglas
    Moriyama, Gyovana
    Martins, Lucas
    Soezima, Luiza
    Nunes, Augusto
    Siqueira, Felipe
    Tarrega, Joao P.
    Beinotti, Joao, V
    Dias, Marcio
    Silva, Matheus
    Gardini, Miguel
    Silva, Vinicius
    de Carvalho, Andre C. P. L. F.
    Oliveira, Adriano L., I
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 3 - 14
  • [28] Social Network Science Approaches for Disease Named Entity Recognition and Extraction
    Joshi, Sarvesh
    Kamath, Sowmya S.
    38TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN 2024, 2024, : 96 - 101
  • [29] Named Entity Recognition Algorithms Comparison For Judicial Text Data
    Aibek, Kuralbayev
    Bobur, Mukhsimbayev
    Abay, Bekbaganbetov
    Hajiyev, Fuad
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [30] A Comprehensive Study of Open-Source Libraries for Named Entity Recognition on Handwritten Historical Documents
    Monroc, Claire Bizon
    Miret, Blanche
    Bonhomme, Marie-Laurence
    Kermorvant, Christopher
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 429 - 444