metaGraphos: a Web-based system for transcribing, proofreading and publishing scanned documents

被引:0
作者
Varthis, Evagelos [1 ]
Poulos, Marios [1 ]
机构
[1] Ionian Univ, Dept Arch, Lib Sci & Museums, Corfu, Greece
关键词
Semantic enhancement; Transcription & OCR correction; GitLab platform; Web publishing; Crowdsourcing system; Parallel collaboration; HANDWRITTEN TEXT RECOGNITION; MODELS;
D O I
10.1108/CC-01-2023-0002
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
PurposeThis study aims to present metaGraphos, a crowdsourcing system that aids in the transcription and semantic enhancement of scanned documents by using a pool of volunteers or people willing to participate in exchange for a financial reward. Design/methodology/approachThe metaGraphos can be used in circumstances where optical character recognition fails to produce satisfactory results, semantic tagging or assigning thematic headings to texts is considered necessary or even when ground-truth data has to be collected in raw form. FindingsThe system automatically provides a Web-based interface comprising a static HTML page and JavaScript code that displays the scanned images of the document, coupled with the corresponding incomplete texts side by side, allowing users to correct or complete the texts in parallel. Social implicationsBy assisting the parallel transcription and the semantic enhancement of difficult scanned documents, the system further reveals the hidden cultural wealth and aids in knowledge dissemination, a fact that contributes significantly to the academic-scientific dialog and feedback. Originality/valueIndividual researchers, libraries and organizations in general may benefit from the system because it is cost-effective, practical and simple to set up client-server architecture that provides a reliable way to transcribe texts or revise transcriptions on a large scale.
引用
收藏
页码:101 / 110
页数:10
相关论文
共 41 条
[31]   Compact Deep Descriptors for Keyword Spotting [J].
Retsinas, George ;
Sfikas, Giorgos ;
Louloudis, Georgios ;
Stamatopoulos, Nikolaos ;
Gatos, Basilis .
PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, :315-320
[32]   Using keyword spotting systems as tools for the transcription of historical handwritten documents: Models and procedures for performance evaluation [J].
Santoro, Adolfo ;
Marcelli, Angelo .
PATTERN RECOGNITION LETTERS, 2020, 131 :329-335
[33]  
Schroder C., 2020, arXiv
[34]  
Sfikas G, 2015, PROC INT CONF DOC, P686, DOI 10.1109/ICDAR.2015.7333849
[35]  
Sudholt S, 2016, INT CONF FRONT HAND, P277, DOI [10.1109/ICFHR.2016.0060, 10.1109/ICFHR.2016.55]
[36]  
Tesseract OCR, 2019, US
[37]  
Varthis E., 2020, INT J METADATA SEMAN, V14, P265, DOI [10.1504/IJMSO.2020.10038163, DOI 10.1504/IJMSO.2020.10038163]
[38]   Semantic enrichment on large scanned collections through their "satellite texts": the paradigm of Migne's Patrologia Graeca [J].
Varthis, Evagelos ;
Tzanavaris, Spyros ;
Giarenis, Ilias ;
Papavlasopoulos, Sozon ;
Drakakis, Manolis ;
Poulos, Marios .
INFORMATION DISCOVERY AND DELIVERY, 2022, 50 (02) :217-234
[39]  
Vats Ekta, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P1294, DOI 10.1109/ICDAR.2019.00209
[40]   A Comprehensive Survey of Graph Neural Networks for Knowledge Graphs [J].
Ye, Zi ;
Kumar, Yogan Jaya ;
Sing, Goh Ong ;
Song, Fengyan ;
Wang, Junsong .
IEEE ACCESS, 2022, 10 :75729-75741