A Novel Approach to Printed Arabic Optical Character Recognition

被引:0
作者
Mansoor A. Al Ghamdi
机构
[1] University of Tabuk,Department of Computer Science, Community College
来源
Arabian Journal for Science and Engineering | 2022年 / 47卷
关键词
Optical character recognition; OCR; Arabic printed OCR; Arabic text recognition; Feature extraction; Segmentation; Classification;
D O I
暂无
中图分类号
学科分类号
摘要
Optical character recognition (OCR) is widely used in various real-world applications, such as digitizing learning resources, to assist visually impaired people and transform printed resources into electronic media. As far as the Arabic language is concerned, the need to extend digital Arabic content on the Internet has recently motivated researchers to focus on the Arabic text recognition. Despite the important number of works studying the Arabic OCR, the latter still faces numerous challenges due to the special characteristics of the Arabic script. This research aims at developing an effective printed Arabic OCR system. In this work, the implementation of a printed Arabic OCR system is described. It is divided into four stages: pre-processing, feature extraction as well as character segmentation and classification. Unlike other typical Arabic OCR systems, in the developed one, the feature extraction stage is performed prior to the character segmentation stage. In the pre-processing stage, a novel thinning algorithm is applied in order to produce skeletons for the Arabic text images. In the second stage, a new chain code representation technique using an agent-based model for the features extraction from non-dotted Arabic text images is proposed. Relying on the extracted features, a character segmentation technique employed to segment-connected Arabic words into characters is introduced. In the classification stage, the prediction by partial matching (PPM) compression-based method is applied as a classifier to recognize the Arabic text. Experimental evaluation of Arabic OCR systems on a public dataset reveals that the system has an accuracy of 77.3% for paragraph-based text images.
引用
收藏
页码:2219 / 2237
页数:18
相关论文
共 65 条
[1]  
Alginahi YM(2013)A survey on Arabic character segmentation Int. J. Doc. Anal. Recogn. 16 105-126
[2]  
Al-Badr B(1995)Survey and bibliography of Arabic optical text recognition Signal Process. 41 49-77
[3]  
Mahmoud SA(2020)An enhanced offline printed Arabic OCR model based on bio-inspired fuzzy classifier IEEE Access 8 1-218
[4]  
Darwish SM(2013)A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution Pattern Recognit. Lett. 34 209-11
[5]  
Elzoghaly KO(2019)Arabic cursive text recognition from natural scene images Appl. Sci. 9 236-1571
[6]  
Slimane F(2013)Printed persian subword recognition using wavelet packet descriptors J. Eng. (UK) 2013 1-71
[7]  
Kanoun S(2007)Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK) Pattern Recognit. Lett. 28 1563-45
[8]  
Hennebert J(1996)Word-level recognition of multifont Arabic text using a feature vector matching approach Doc. Recogn. III 2660 63-162
[9]  
Alimi AM(2018)A holistic technique for an Arabic OCR system J. Imag. 4 1-787
[10]  
Ingold R(2017)A holistic technique for an Arabic OCR system J. Imag. 4 6-8