IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach

被引:26
作者
Elzobi, Moftah [1 ]
Al-Hamadi, Ayoub [1 ]
Al Aghbari, Zaher [2 ]
Dings, Laslo [1 ]
机构
[1] Inst Elect Signal Proc & Commun IESK, Magdeburg, Germany
[2] Univ Sharjah, Dept Comp Sci, Sharjah, U Arab Emirates
关键词
Arabic OCR; Off-line handwriting recognition; Handwritten Arabic database; Handwriting segmentation; Baseline estimation; CHARACTER-RECOGNITION; STRATEGIES;
D O I
10.1007/s10032-012-0190-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Even though a lot of researches have been conducted in order to solve the problem of unconstrained handwriting recognition, an effective solution is still a serious challenge. In this article, we address two Arabic handwriting recognition-related issues. Firstly, we present IESK-arDB, a new multi-propose off-line Arabic handwritten database. It is publicly available and contains more than 4,000 word images, each equipped with binary version, thinned version as well as a ground truth information stored in separate XML file. Additionally, it contains around 6,000 character images segmented from the database. A letter frequency analysis showed that the database exhibits letter frequencies similar to that of large corpora of digital text, which proof the database usefulness. Secondly, we proposed a multi-phase segmentation approach that starts by detecting and resolving sub-word overlaps, then hypothesizing a large number of segmentation points that are later reduced by a set of heuristic rules. The proposed approach has been successfully tested on IESK-arDB. The results were very promising, indicating the efficiency of the suggested approach.
引用
收藏
页码:295 / 308
页数:14
相关论文
共 40 条
[1]   Recognition of off-line cursive handwriting [J].
Abuhaiba, ISI ;
Holt, MJJ ;
Datta, S .
COMPUTER VISION AND IMAGE UNDERSTANDING, 1998, 71 (01) :19-38
[2]   HAH manuscripts: A holistic paradigm for classifying and retrieving historical Arabic handwritten documents [J].
Al Aghbari, Zaher ;
Brook, Salama .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (08) :10942-10951
[3]   A data base for arabic handwritten text recognition research [J].
Al-Ma'adeed, S ;
Elliman, D ;
Higgins, CA .
EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, :485-489
[4]   Databases for recognition of handwritten Arabic cheques [J].
Al-Ohali, Y ;
Cheriet, M ;
Suen, C .
PATTERN RECOGNITION, 2003, 36 (01) :111-121
[5]  
Alamri H, 2009, LECT NOTES COMPUT SC, V5702, P165, DOI 10.1007/978-3-642-03767-2_20
[6]   A METHOD OF RECOGNITION OF ARABIC CURSIVE HANDWRITING [J].
ALMUALLIM, H ;
YAMAGUCHI, S .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1987, 9 (05) :715-722
[7]   An overview of character recognition focused on off-line handwriting [J].
Arica, N ;
Yarman-Vural, FT .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (02) :216-233
[8]   A heuristic algorithm for optical character recognition of Arabic script [J].
Atici, AA ;
YarmanVural, FT .
SIGNAL PROCESSING, 1997, 62 (01) :87-99
[9]  
Belaïd A, 2008, LECT NOTES COMPUT SC, V4768, P36, DOI 10.1007/978-3-540-78199-8_3
[10]  
Blumenstein M, 2008, STUD COMPUT INTELL, V90, P259