A Self-organizing Feature Map for Arabic Word Extraction

被引:1
作者
Bouressace, Hassina [1 ]
Csirik, Janos [1 ]
机构
[1] Univ Szeged, 13 Dugon Sq, H-6720 Szeged, Hungary
来源
TEXT, SPEECH, AND DIALOGUE (TSD 2019) | 2019年 / 11697卷
关键词
Handwriting documents; Word segmentation; Neural network; Connected components; TEXT LINE; SEGMENTATION;
D O I
10.1007/978-3-030-27947-9_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Arabic word spotting is a key step for Arabic NLP and the text recognition task. Many recent studies have addressed segmentation problems in the Arabic language. However, many issues still have to be overcome. In this paper, we propose a new approach for segmenting an image Arabic text into its constituent words. Our approach consists of two main steps. In the first step, a set of features is extracted from connected components using the Run-length smoothing algorithm (RLSA). In the second step, spatially close connected components that are likely to belong to the same word component are grouped together. This is done via a learning technique called the self-organizing feature map (Kohonen map). We evaluated our approach on 300 images with different sizes and fonts for handwritten text using AHDB. Our results suggest that our approach can efficiently segments lines. Moreover, as our approach is based on a straightforward machine learning model, it should be possible to adapt it to other languages as well.
引用
收藏
页码:127 / 136
页数:10
相关论文
共 17 条
[1]  
Al-Dmour A., 2016, Int. Rev. Comput. Softw, V11, P436, DOI [10.15866/irecos.v11i5.9384, DOI 10.15866/IRECOS.V11I5.9384]
[2]   A data base for arabic handwritten text recognition research [J].
Al-Ma'adeed, S ;
Elliman, D ;
Higgins, CA .
EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, :485-489
[3]  
Aouadi N., 2016, International Journal of Computing Information Sciences, V12, P17, DOI 10.21700/ijcis.2016.103
[4]  
Belabiod A., 2018, Line and Word Segmentation of Arabic handwritten documents using Neural Networks
[5]   Recognition of the Logical Structure of Arabic Newspaper Pages [J].
Bouressace, Hassina ;
Csirik, Janos .
TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 :251-258
[6]  
Elzobi M, 2011, WSCG 2011: COMMUNICATION PAPERS PROCEEDINGS, P135
[7]  
Graves A., 2006, P 23 INT C MACHINE L, P369
[8]  
H AlKhateeb J., 2009, Recent Advances in Technologies
[9]   THE SELF-ORGANIZING MAP [J].
KOHONEN, T .
PROCEEDINGS OF THE IEEE, 1990, 78 (09) :1464-1480
[10]   Text line and word segmentation of handwritten documents [J].
Louloudis, G. ;
Gatos, B. ;
Pratikakis, I. ;
Halatsis, C. .
PATTERN RECOGNITION, 2009, 42 (12) :3169-3183