Arabic calligraphy, typewritten and handwritten using optical character recognition (OCR) system

被引:2
作者
Al-Barhamtoshy, Hassanin M. [1 ]
Jambi, Kamal M. [1 ]
Ahmed, Hany [2 ]
Mohamed, Shaimaa [3 ]
Abdo, Sherif M. [2 ]
Rashwan, Mohsen A. [3 ]
机构
[1] King Abdulaziz Univ, Dept Informat Technol, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
[2] Cairo Univ, Fac Comp & Informat Syst, Cairo, Egypt
[3] Cairo Univ, Elect & Commun Dept, Cairo, Egypt
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2019年 / 12卷 / 02期
关键词
ARABIC OCR; SEGMENTATION; FEATURE EXTRACTION; CALLIGRAPHY; TYPEWRITTEN; HANDWRITTEN; HMM;
D O I
10.21786/bbrc/12.2/11
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
This paper describes an Omni OCR system for recognizing typewritten and handwritten Arabic texts documents. The proposed system of the Arabic OCR system can be classified into four main phases. The first phase is the pre-processing phase; it focuses on binarizing, skewing treatment, framing, and noise removing from the prepared documents (dataset). The second phase aims to segment the preprocessed documents into lines and words. Two main tasks are pointed during this phase: language model with the used Arabic dictionary, and the detection of segmented lines and segmented words. The third phase is features extraction phase; it is used to extract features for each segmented line/word according to the used language model. Finally, the classifier or the recognizer will be used to recognize each word/line into a text stream. Therefore, scientific evaluation of the four phases will be applied to measure the accuracy of the Arabic OCR system. The recognition approachis based on Hidden Markov Models (HMM) with the prepared datasets and software development tool are discussed and introduced. State of the art OCR's recognition systems are now capable to perform accuracy of 70% for unconstrained Arabic texts. However, this outline is still far away from what is required in a lot of useful applications. In other words, this paper describes a proposed approach based on language model with ligature and overlap characters for the pro-posed Arabic OCR. Therefore, a posterior word-based approach is used with tri-gram model to recognize the Arabic text. Features are extracted from images of words and generated pattern using the proposed solution. We test our proposed OCR system in different categories of Arabic documents: early printed or typewritten, printed, historical and calligraphy documents. The test bed of our system gives 12.5%-character error rate compared to the best OCR of other systems.
引用
收藏
页码:283 / 296
页数:14
相关论文
共 50 条
  • [31] Feature Extraction Using Geometrical Features for Malayalam Handwritten Character Recognition System
    Thushara, K.
    James, Ajay
    Saravanan, C.
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 477 - 482
  • [32] Printed Arabic character recognition using HMM
    Hassin, AH
    Tang, XL
    Liu, JF
    Zhao, W
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2004, 19 (04) : 538 - 543
  • [33] Printed Arabic character recognition using HMM
    Abbas H. Hassin
    Xiang-Long Tang
    Jia-Feng Liu
    Wei Zhao
    Journal of Computer Science and Technology, 2004, 19 : 538 - 543
  • [34] Design and Evaluation of Arabic Handwritten Digit Recognition System Using Biologically Plausible Methods
    Hussain, Nadir
    Ali, Mushtaq
    Syed, Sidra Abid
    Ghoniem, Rania M.
    Ejaz, Nazia
    Alramli, Omar Imhemed
    Ala'anzy, Mohammed Alaa
    Ahmad, Zulfiqar
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (09) : 12509 - 12523
  • [35] Unconstrained handwritten character recognition using metaclasses of characters
    Koerich, AL
    Kalva, PR
    2005 International Conference on Image Processing (ICIP), Vols 1-5, 2005, : 2045 - 2048
  • [36] Hand-printed Arabic character recognition system using an artificial network
    Amin, A
    AlSadoun, H
    Fischer, S
    PATTERN RECOGNITION, 1996, 29 (04) : 663 - 675
  • [37] Offline Handwritten Sanskrit Simple and Compound Character Recognition Using Neural Network
    Mehta, Jyoti
    Garg, Naresh
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT, ICT4SD 2015, VOL 1, 2016, 408 : 597 - 605
  • [38] A Robust System for Online Handwritten Chinese/Japanese Character Recognition
    Zhu, B. L.
    Nakagawa, Masaki
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND INFORMATION TECHNOLOGY (SEIT2015), 2016, : 247 - 254
  • [39] Hybrid Arabic handwritten character segmentation using CNN and graph theory algorithm
    Berriche, Lamia
    Alqahtani, Ashjan
    RekikR, Siwar
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
  • [40] A fuzzy expert system for recognition of handwritten Arabic sub-words
    Khedher, Mohammed Zeki
    Al-Talib, Ghayda
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 324 - +