A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters

被引:19
作者
Boros, Emanuela [1 ]
Romero, Veronica [2 ]
Maarand, Martin [1 ]
Zenklova, Katerina [3 ]
Kreckova, Jitka [3 ]
Vidal, Enrique [2 ]
Stutzmann, Dominique [4 ]
Kermorvant, Christopher [1 ]
机构
[1] TEKLIA, Paris, France
[2] Univ Politecn Valencia, PRHLT Res Ctr, Valencia, Spain
[3] Narodni Arch, Prague, Czech Republic
[4] IRHT CNRS, Paris, France
来源
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020) | 2020年
关键词
Named entity recognition; Handwritten Text Recognition; historical document processing; multilingualism;
D O I
10.1109/ICFHR2020.2020.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a new corpus of multilingual medieval handwritten charter images, annotated with full transcription and named entities. The corpus is used to compare two approaches for named entity recognition in historical document images in several languages: on the one hand, a sequential approach, more commonly used, that sequentially applies handwritten text recognition (HTR) and named entity recognition (NER), on the other hand, a combined approach that simultaneously transcribes the image text line and extracts the entities. Experiments conducted on the charter corpus in Latin, early new high German and old Czech for name, date and location recognition demonstrate a superior performance of the combined approach.
引用
收藏
页码:79 / 84
页数:6
相关论文
共 50 条
[41]   Combining Neural and Knowledge-Based Approaches to Named Entity Recognition in Polish [J].
Dadas, Slawomir .
ARTIFICIAL INTELLIGENCEAND SOFT COMPUTING, PT I, 2019, 11508 :39-50
[42]   Comparison of Named Entity Recognition models based on Neural Network in Biomedical [J].
Kishwar, Azka ;
Batool, Komal .
PROCEEDINGS OF 2021 INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGIES (IBCAST), 2021, :426-431
[43]   Combined Attention Mechanism for Named Entity Recognition in Chinese Electronic Medical Records [J].
Li, Luqi ;
Hou, Li .
2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, :476-477
[44]   Chinese named entity recognition combined active learning with self-training [J].
Zhong, Zhinong, 1600, National University of Defense Technology (36) :82-88
[45]   Development of a Hindi Named Entity Recognition System without Using Manually Annotated Training Corpus [J].
Saha, Sujan Kumar ;
Majumder, Mukta .
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (06) :1088-1098
[46]   Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis [J].
Jiang, Hang ;
Hua, Yining ;
Beeferman, Doug ;
Roy, Deb .
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, :7199-7208
[47]   Textual data augmentation using generative approaches - Impact on named entity recognition tasks [J].
Cao, Danrun ;
Bechet, Nicolas ;
Marteau, Pierre-Francois ;
Ahmia, Oussama .
DATA & KNOWLEDGE ENGINEERING, 2025, 156
[48]   Benchmarking Named Entity Recognition Approaches for Extracting Research Infrastructure Information from Text [J].
Cheirmpos, Georgios ;
Tabatabaei, Seyed Amin ;
Kanoulas, Evangelos ;
Tsatsaronis, Georgios .
MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT I, 2024, 14505 :131-141
[49]   An adaptive multi-neural network model for named entity recognition of Chinese mechanical equipment corpus [J].
Lyu, Pin ;
Yue, Yongyong ;
Yu, Wengbing ;
Xiao, Liqiao ;
Liu, Chao ;
Zheng, Pai .
JOURNAL OF ENGINEERING DESIGN, 2024,
[50]   Research on Chinese named entity recognition using combined boundary-PoS feature [J].
Qiang, Bao-Hua ;
Huang, Jun ;
Wang, Yu-Feng ;
Wang, Sai ;
Wang, Yong .
DESIGN, MANUFACTURING AND MECHATRONICS (ICDMM 2015), 2016, :839-848