Open-source OCR Engine Integration with Greek Dictionary

被引:1
作者
Alkiviadis, Tsimpiris [1 ]
Varsamis, Dimitrios [1 ]
Strouthopoulos, Charalampos [1 ]
Pavlidis, George [2 ]
Chairi, Kiourt [2 ]
机构
[1] Int Hellen Univ, Dept Comp Informat & Telecommun Engn, Serres, Greece
[2] Univ Campus Kimmeria, ATHENA Res & Innovat Ctr Informat Commun & Knowle, Xanthi, Greece
来源
25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021) | 2021年
关键词
OCR; Tesseract; Greek; dictionary;
D O I
10.1145/3503823.3503903
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The aim of this study is the evaluation of an open-source OCR engine (tesseract OCR ver.4.0) by integration of a Greek dictionary with more than 500,000 words. To achieve this goal, an open access dictionary was initially used which was enriched with words that exist in the Greek restaurant menus. The training applied in the embedded LSTM deep learning model of Tesseract, before the integration of the new Greek dictionary. The evaluation of OCR performance applied with combinations of dictionaries in a total of 98 images from Greek catering menus. A slight but stable improvement of OCR performance after training and integration of the new Greek dictionary is observed at the results.
引用
收藏
页码:436 / 441
页数:6
相关论文
共 13 条
[1]  
Clausner C., 2014, 11 INT ASS PATTERN R, P19
[2]   Incremental construction of minimal acyclic finite-state automata [J].
Daciuk, J ;
Mihov, S ;
Watson, BW ;
Watson, RE .
COMPUTATIONAL LINGUISTICS, 2000, 26 (01) :3-16
[3]  
Gatos B, 2015, PROC INT CONF DOC, P646, DOI 10.1109/ICDAR.2015.7333841
[4]   OCR binarization and image pre-processing for searching historical documents [J].
Gupta, Maya R. ;
Jacobson, Nathaniel P. ;
Garcia, Eric K. .
PATTERN RECOGNITION, 2007, 40 (02) :389-397
[5]  
Harraj A., 2015, SIGNAL IMAGE PROCESS, V6
[6]  
Helinski M., 2012, REPORT COMP TESSERAC
[7]  
Holley R., 2009, D LIB MAGAZINE, V15
[8]  
LEVENSHT.VI, 1965, DOKL AKAD NAUK SSSR+, V163, P845
[9]  
Margaronis J., 2009, MOCR 09
[10]  
Markantonatou S., 2018, 7 INT C STRATEGIC IN