A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters

被引:19
作者
Boros, Emanuela [1 ]
Romero, Veronica [2 ]
Maarand, Martin [1 ]
Zenklova, Katerina [3 ]
Kreckova, Jitka [3 ]
Vidal, Enrique [2 ]
Stutzmann, Dominique [4 ]
Kermorvant, Christopher [1 ]
机构
[1] TEKLIA, Paris, France
[2] Univ Politecn Valencia, PRHLT Res Ctr, Valencia, Spain
[3] Narodni Arch, Prague, Czech Republic
[4] IRHT CNRS, Paris, France
来源
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020) | 2020年
关键词
Named entity recognition; Handwritten Text Recognition; historical document processing; multilingualism;
D O I
10.1109/ICFHR2020.2020.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a new corpus of multilingual medieval handwritten charter images, annotated with full transcription and named entities. The corpus is used to compare two approaches for named entity recognition in historical document images in several languages: on the one hand, a sequential approach, more commonly used, that sequentially applies handwritten text recognition (HTR) and named entity recognition (NER), on the other hand, a combined approach that simultaneously transcribes the image text line and extracts the entities. Experiments conducted on the charter corpus in Latin, early new high German and old Czech for name, date and location recognition demonstrate a superior performance of the combined approach.
引用
收藏
页码:79 / 84
页数:6
相关论文
共 50 条
[21]   A web-based Bengali news corpus for named entity recognition [J].
Ekbal, Asif ;
Bandyopadhyay, Sivaji .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (02) :173-182
[22]   Big Data and Named Entity Recognition Approaches for Urdu Language [J].
Jamil, Qudsia ;
Zafar, Muhammad Rehman .
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2018, 4 (16) :1-5
[23]   Clinical named-entity recognition: A short comparison [J].
Lossio-Ventura, Juan Antonio ;
Boussard, Sebastien ;
Morzan, Juandiego ;
Hernandez-Boussard, Tina .
2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, :1548-1550
[24]   Efficient combined approach for named entity recognition in spoken language [J].
Zidouni, Azeddine ;
Rosset, Sophie ;
Glotin, Herve .
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, :1293-+
[25]   Distantly Supervised Named Entity Recognition Combined with Prototypical Networks [J].
Luo S. ;
Lin Z. ;
Pan L. ;
Wu Z. .
Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2023, 43 (04) :410-416
[26]   Named entity recognition in medical domain combined with knowledge graph [J].
Jin Z. ;
He X. ;
Yue S. ;
Xiong Y. ;
Luo J. .
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2023, 55 (05) :50-58
[27]   A Named Entity Recognition Corpus for Vietnamese Biomedical Texts to Support Tuberculosis Treatment [J].
Phan, Uyen T. P. ;
Nguyen, Phuong N. V. ;
Nguyen, Nhung T. H. .
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, :3601-3609
[28]   UlyssesNER-Br: A Corpus of Brazilian Legislative Documents for Named Entity Recognition [J].
Albuquerque, Hidelberg O. ;
Costa, Rosimeire ;
Silvestre, Gabriel ;
Souza, Ellen ;
da Silva, Nadia F. F. ;
Vitorio, Douglas ;
Moriyama, Gyovana ;
Martins, Lucas ;
Soezima, Luiza ;
Nunes, Augusto ;
Siqueira, Felipe ;
Tarrega, Joao P. ;
Beinotti, Joao, V ;
Dias, Marcio ;
Silva, Matheus ;
Gardini, Miguel ;
Silva, Vinicius ;
de Carvalho, Andre C. P. L. F. ;
Oliveira, Adriano L., I .
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 :3-14
[29]   Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus [J].
Suriyachay, Kitiya ;
Sornlertlamvanich, Virach .
2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, :30-35
[30]   DrugSemantics: A corpus for Named Entity. Recognition in Spanish Summaries of Product Characteristics [J].
Moreno, Isabel ;
Boldrini, Ester ;
Moreda, Paloma ;
Teresa Roma-Ferri, M. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 72 :8-22