Arabic calligraphy, typewritten and handwritten using optical character recognition (OCR) system

被引:2
|
作者
Al-Barhamtoshy, Hassanin M. [1 ]
Jambi, Kamal M. [1 ]
Ahmed, Hany [2 ]
Mohamed, Shaimaa [3 ]
Abdo, Sherif M. [2 ]
Rashwan, Mohsen A. [3 ]
机构
[1] King Abdulaziz Univ, Dept Informat Technol, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
[2] Cairo Univ, Fac Comp & Informat Syst, Cairo, Egypt
[3] Cairo Univ, Elect & Commun Dept, Cairo, Egypt
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2019年 / 12卷 / 02期
关键词
ARABIC OCR; SEGMENTATION; FEATURE EXTRACTION; CALLIGRAPHY; TYPEWRITTEN; HANDWRITTEN; HMM;
D O I
10.21786/bbrc/12.2/11
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
This paper describes an Omni OCR system for recognizing typewritten and handwritten Arabic texts documents. The proposed system of the Arabic OCR system can be classified into four main phases. The first phase is the pre-processing phase; it focuses on binarizing, skewing treatment, framing, and noise removing from the prepared documents (dataset). The second phase aims to segment the preprocessed documents into lines and words. Two main tasks are pointed during this phase: language model with the used Arabic dictionary, and the detection of segmented lines and segmented words. The third phase is features extraction phase; it is used to extract features for each segmented line/word according to the used language model. Finally, the classifier or the recognizer will be used to recognize each word/line into a text stream. Therefore, scientific evaluation of the four phases will be applied to measure the accuracy of the Arabic OCR system. The recognition approachis based on Hidden Markov Models (HMM) with the prepared datasets and software development tool are discussed and introduced. State of the art OCR's recognition systems are now capable to perform accuracy of 70% for unconstrained Arabic texts. However, this outline is still far away from what is required in a lot of useful applications. In other words, this paper describes a proposed approach based on language model with ligature and overlap characters for the pro-posed Arabic OCR. Therefore, a posterior word-based approach is used with tri-gram model to recognize the Arabic text. Features are extracted from images of words and generated pattern using the proposed solution. We test our proposed OCR system in different categories of Arabic documents: early printed or typewritten, printed, historical and calligraphy documents. The test bed of our system gives 12.5%-character error rate compared to the best OCR of other systems.
引用
收藏
页码:283 / 296
页数:14
相关论文
共 50 条
  • [11] Recognition of Arabic handwritten words using contextual character models
    El-Hajj, Ramy
    Mokbel, Chafic
    Ukforman-Sulm, Laurence
    DOCUMENT RECOGNITION AND RETRIEVAL XV, 2008, 6815
  • [12] Automated System for Arabic Optical Character Recognition
    Aljarrah, Inad
    Al-Khaleel, Osama
    Mhaidat, Khaldoon
    Alrefai, Mu'ath
    Alzu'bi, Abdullah
    Rabab'ah, Mohammad
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS'12), 2012,
  • [13] An Arabic optical character recognition system using recognition-based segmentation
    Cheung, A
    Bennamoun, M
    Bergmann, NW
    PATTERN RECOGNITION, 2001, 34 (02) : 215 - 233
  • [14] An artificial immune system for offline isolated handwritten arabic character recognition
    Chaouki Boufenar
    Mohamed Batouche
    Marc Schoenauer
    Evolving Systems, 2018, 9 : 25 - 41
  • [15] An artificial immune system for offline isolated handwritten arabic character recognition
    Boufenar, Chaouki
    Batouche, Mohamed
    Schoenauer, Marc
    EVOLVING SYSTEMS, 2018, 9 (01) : 25 - 41
  • [16] Optical Handwritten with Character Recognition
    Zahra, Syeda Binish
    Moaen, Shanza
    Munir, Sundus
    Hassan, Arfa
    Nadeem, Afrozah
    Farooq, Muhammad Sajid
    4TH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING (IC)2, 2021, : 562 - 569
  • [17] Handwritten Optical Character Recognition System for Sindhi Numerals
    Sanjrani, Anwar Ali
    Baber, Junaid
    Bakhtyar, Maheen
    Noor, Waheed
    Khalid, Muhammad
    2016 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONIC AND ELECTRICAL ENGINEERING (ICE CUBE), 2016, : 262 - 267
  • [18] Handwritten arabic character recognition using co-occurrence matrices
    Assaleh, K
    Al-Rousan, M
    Ghazal, M
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS: IMAGE, ACOUSTIC, SIGNAL PROCESSING AND OPTICAL SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 191 - 194
  • [19] Optical character recognition (OCR) in uncontrolled environments using optical correlators
    Morin, A
    Bergeron, A
    Prévost, D
    Radloff, E
    OPTICAL PATTERN RECOGNITION X, 1999, 3715 : 346 - 356
  • [20] A recognition-based Arabic optical character recognition system
    Cheung, A
    Bennamoun, M
    Bergmann, NW
    1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 4189 - 4194