Arabic calligraphy, typewritten and handwritten using optical character recognition (OCR) system

被引:2
作者
Al-Barhamtoshy, Hassanin M. [1 ]
Jambi, Kamal M. [1 ]
Ahmed, Hany [2 ]
Mohamed, Shaimaa [3 ]
Abdo, Sherif M. [2 ]
Rashwan, Mohsen A. [3 ]
机构
[1] King Abdulaziz Univ, Dept Informat Technol, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
[2] Cairo Univ, Fac Comp & Informat Syst, Cairo, Egypt
[3] Cairo Univ, Elect & Commun Dept, Cairo, Egypt
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2019年 / 12卷 / 02期
关键词
ARABIC OCR; SEGMENTATION; FEATURE EXTRACTION; CALLIGRAPHY; TYPEWRITTEN; HANDWRITTEN; HMM;
D O I
10.21786/bbrc/12.2/11
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
This paper describes an Omni OCR system for recognizing typewritten and handwritten Arabic texts documents. The proposed system of the Arabic OCR system can be classified into four main phases. The first phase is the pre-processing phase; it focuses on binarizing, skewing treatment, framing, and noise removing from the prepared documents (dataset). The second phase aims to segment the preprocessed documents into lines and words. Two main tasks are pointed during this phase: language model with the used Arabic dictionary, and the detection of segmented lines and segmented words. The third phase is features extraction phase; it is used to extract features for each segmented line/word according to the used language model. Finally, the classifier or the recognizer will be used to recognize each word/line into a text stream. Therefore, scientific evaluation of the four phases will be applied to measure the accuracy of the Arabic OCR system. The recognition approachis based on Hidden Markov Models (HMM) with the prepared datasets and software development tool are discussed and introduced. State of the art OCR's recognition systems are now capable to perform accuracy of 70% for unconstrained Arabic texts. However, this outline is still far away from what is required in a lot of useful applications. In other words, this paper describes a proposed approach based on language model with ligature and overlap characters for the pro-posed Arabic OCR. Therefore, a posterior word-based approach is used with tri-gram model to recognize the Arabic text. Features are extracted from images of words and generated pattern using the proposed solution. We test our proposed OCR system in different categories of Arabic documents: early printed or typewritten, printed, historical and calligraphy documents. The test bed of our system gives 12.5%-character error rate compared to the best OCR of other systems.
引用
收藏
页码:283 / 296
页数:14
相关论文
共 50 条
  • [41] Malayalam Handwritten Character Recognition Using Convolutional Neural Network
    Nair, Pranav P.
    James, Ajay
    Saravanan, C.
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 278 - 281
  • [42] Handwritten Character Recognition using Machine Learning Approach - A Survey
    Patel, Shivangkumar R.
    Jha, Jasmine
    2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, SIGNALS, COMMUNICATION AND OPTIMIZATION (EESCO), 2015,
  • [43] On-Line Handwritten Character Recognition using Kohonen Networks
    Sreeraj, M.
    Idicula, Sumam Mary
    2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 1424 - 1429
  • [44] Online handwritten Gurmukhi character recognition using elastic matching
    Sharma, Anuj
    Kumar, Rajesh
    Sharma, R. K.
    CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 2, PROCEEDINGS, 2008, : 391 - +
  • [45] Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG)
    Jebril N.A.
    Al-Zoubi H.R.
    Abu Al-Haija Q.
    Pattern Recognition and Image Analysis, 2018, 28 (02) : 321 - 345
  • [46] Recognition of Handwritten Arabic and Hindi Numerals Using Convolutional Neural Networks
    Alqudah, Amin
    Alqudah, Ali Mohammad
    Alquran, Hiam
    Al-Zoubi, Hussein R.
    Al-Qodah, Mohammed
    Al-Khassaweneh, Mahmood A.
    APPLIED SCIENCES-BASEL, 2021, 11 (04): : 1 - 30
  • [47] Recognition of handwritten Arabic words using a neuro-fuzzy network
    Boukharouba, Abdelhak
    Bennia, Abdelhak
    INTELLIGENT SYSTEMS AND AUTOMATION, 2008, 1019 : 254 - +
  • [48] Recognition of On-line Arabic Handwritten Characters Using Structural Features
    Al-Taani, Ahmad T.
    Al-Haj, Saeed
    JOURNAL OF PATTERN RECOGNITION RESEARCH, 2010, 5 (01): : 23 - 37
  • [49] Recognition of Offline Handwritten Arabic Words Using a Few Structural Features
    Saidi, Abderrahmane
    Lakhdar, Abdelmouneim Moulay
    Beladgham, Mohammed
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (03): : 2875 - 2889
  • [50] A novel SVM-based handwritten Tamil character recognition system
    Shanthi, N.
    Duraiswamy, K.
    PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (02) : 173 - 180