Arabic Optical Character Recognition: A Review

被引:11
作者
Alghyaline, Salah [1 ]
机构
[1] World Islamic Sci & Educ Univ, Dept Comp Sci, Amman 110111947, Jordan
来源
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES | 2023年 / 135卷 / 03期
关键词
Arabic Optical Character Recognition (OCR); Arabic OCR software; Arabic OCR datasets; Arabic OCR evaluation; SEGMENTATION-FREE; TEXT RECOGNITION; TRANSFORM; SCRIPTS; SYSTEM; ROBUST; MODEL;
D O I
10.32604/cmes.2022.024555
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This study aims to review the latest contributions in Arabic Optical Character Recognition (OCR) during the last decade, which helps interested researchers know the existing techniques and extend or adapt them accordingly. The study describes the characteristics of the Arabic language, different types of OCR systems, different stages of the Arabic OCR system, the researcher's contributions in each step, and the evaluation metrics for OCR. The study reviews the existing datasets for the Arabic OCR and their characteristics. Additionally, this study implemented some preprocessing and segmentation stages of Arabic OCR. The study compares the performance of the existing methods in terms of recognition accuracy. In addition to researchers' OCR methods, commercial and open-source systems are used in the comparison. The Arabic language is morphologically rich and written cursive with dots and diacritics above and under the characters. Most of the existing approaches in the literature were evaluated on isolated characters or isolated words under a controlled environment, and few approaches were tested on page-level scripts. Some comparative studies show that the accuracy of the existing Arabic OCR commercial systems is low, under 75% for printed text, and further improvement is needed. Moreover, most of the current approaches are offline OCR systems, and there is no remarkable contribution to online OCR systems.
引用
收藏
页码:1825 / 1861
页数:37
相关论文
共 143 条
[1]  
ABBYY Software, 2022, ABBYY FINEREADER ENG
[2]  
Abdalkafor Ahmed Subhi, 2021, 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT), P82, DOI 10.1109/I3CAT53310.2021.9629408
[3]   Building a multi-modal Arabic corpus (MMAC) [J].
AbdelRaouf, Ashraf ;
Higgins, Colin A. ;
Pridmore, Tony ;
Khalil, Mahmoud .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2010, 13 (04) :285-302
[4]  
Abdi M. N., 2012, 2012 IEEE 12th International Conference on Computer and Information Technology (CIT), P592, DOI 10.1109/CIT.2012.126
[5]  
Abu Doush Iyad, 2016, International Journal of Reasoning-based Intelligent Systems, V8, P91, DOI 10.1504/ijris.2016.082957
[6]  
Abu Doush I, 2018, INT CONF COMP SCI, P150, DOI 10.1109/CSIT.2018.8486162
[7]   A novel Arabic OCR post-processing using rule-based and word context techniques [J].
Abu Doush, Iyad ;
Alkhateeb, Faisal ;
Gharaibeh, Anwaar Hamdi .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2018, 21 (1-2) :77-89
[8]  
Adriano J. E. M., 2019, IOP Conference Series: Materials Science and Engineering, V482, DOI [10.1088/1757-899x/482/1/012049, 10.1088/1757-899X/482/1/012049]
[9]   Automated bank cheque verification using image processing and deep learning methods [J].
Agrawal, Prateek ;
Chaudhary, Deepak ;
Madaan, Vishu ;
Zabrovskiy, Anatoliy ;
Prodan, Radu ;
Kimovski, Dragi ;
Timmerer, Christian .
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (04) :5319-5350
[10]  
Ahmad I, 2015, PROC INT CONF DOC, P751, DOI 10.1109/ICDAR.2015.7333862