Optical Character Recognition for Quranic Image Similarity Matching

被引:12
作者
Alotaibi, Faiz [1 ]
Abdullah, Muhamad Taufik [1 ]
Abdullah, Rusli Bin Hj [1 ]
Rahmat, Rahmita Wirza Binti O. K. [1 ]
Hashem, Ibrahim Abaker Targio [2 ]
Sangaiah, Arun Kumar [3 ]
机构
[1] Univ Putra Malaysia, Fac Comp Sci & Informat Technol, Serdang 43400, Malaysia
[2] Asia Pacific Univ Technol & Innovat Technol, Dept Comp Technol, Kuala Lumpur 57000, Malaysia
[3] VIT Univ, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
关键词
Image processing; character recognition; Quranic diacritics; knn; optimization;
D O I
10.1109/ACCESS.2017.2771621
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The detection and recognition and then conversion of the characters in an image into a text are called optical character recognition (OCR). A distinctive-type of OCR is used to process Arabic characters, namely, Arabic OCR. OCR is increasingly used in many applications, where this process is preferred to automatically perform a process without human association. The Quranic text contains two elements, namely, diacritics and characters. However, processing these elements may cause malfunction to the OCR system and reduce its level of accuracy. In this paper, a new method is proposed to check the similarity and originality of Quranic content. This method is based on a combination of Quranic diacritic and character recognition techniques. Diacritic detections are performed using a region-based algorithm. An optimization technique is applied to increase the recognition ratio. Moreover, character recognition is performed based on the projection method. An optimization technique is applied to increase the recognition ratio. The result of the proposed method is compared with the standard Mushaf al Madinah benchmark to find similarities that match with texts of the Holy Quran. The obtained accuracy was superior to the other tested K-nearest neighbor (knn) algorithm and published results in the literature. The accuracies were 96.4286% and 92.3077% better in the improved knn algorithm for diacritics and characters, respectively, than in the knn algorithm.
引用
收藏
页码:554 / 562
页数:9
相关论文
共 16 条
[1]  
Al-Badr B., 1995, Proceedings of the Third International Conference on Document Analysis and Recognition, P355, DOI 10.1109/ICDAR.1995.599012
[2]   Recognition of off-line printed Arabic text using Hidden Markov Models [J].
Al-Muhtaseb, Husni A. ;
Mahmoud, Sabri A. ;
Qahwaji, Rami S. .
SIGNAL PROCESSING, 2008, 88 (12) :2902-2912
[3]   RECOGNITION OF ARABIC CHARACTERS [J].
ALYOUSEFI, H ;
UDPA, SS .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1992, 14 (08) :853-857
[4]  
[Anonymous], 2013, APPL MATH SCI
[5]   OMNIDOCUMENT TECHNOLOGIES [J].
BOKSER, M .
PROCEEDINGS OF THE IEEE, 1992, 80 (07) :1066-1078
[6]  
Cheung A, 1997, TENCON IEEE REGION, P531, DOI 10.1109/TENCON.1997.648261
[7]   Low-level structure feature extraction for image processing via stacked sparse denoising autoencoder [J].
Fan, Zunlin ;
Bi, Duyan ;
He, Linyuan ;
Ma Shiping ;
Gao, Shan ;
Li, Cheng .
NEUROCOMPUTING, 2017, 243 :12-20
[8]   Preserving Content Integrity of Digital Holy Quran: Survey and Open Challenges [J].
Hakak, Saqib ;
Kamsin, Amirrudin ;
Tayan, Omar ;
Idris, Mohd Yamani Idna ;
Gani, Abdullah ;
Zerdoumi, Saber .
IEEE ACCESS, 2017, 5 :7305-7325
[9]  
Khemakhem M., 2005, P 3 ACS IEEE INT C C, P121, DOI DOI 10.1109/AICCSA
[10]  
Mesleh A., 2012, CONT ENG SCI, V5, P521