Word-Level Script Identification Using Texture Based Features

被引：3

作者：

Singh, Pawan Kumar ^{[1
]}

Sarkar, Ram ^{[1
]}

Nasipuri, Mita ^{[1
]}

机构：

[1] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India

来源：

INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS | 2015年 / 4卷 / 02期

关键词：

Handwritten Words; Histograms of Oriented Gradients (HOG); Indic Scripts; Moment Invariant Features; Optical Character Recognition; Script Identification; Statistical Significance Testing;

D O I：

10.4018/ijsda.2015040105

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Script identification is an appealing research interest in the field of document image analysis during the last few decades. The accurate recognition of the script is paramount to many post-processing steps such as automated document sorting, machine translation and searching of text written in a particular script in multilingual environment. For automatic processing of such documents through Optical Character Recognition (OCR) software, it is necessary to identify different script words of the documents before feeding them to the OCR of individual scripts. In this paper, a robust word-level handwritten script identification technique has been proposed using texture based features to identify the words written in any of the seven popular scripts namely, Bangla, Devanagari, Gurumukhi, Malayalam, Oriya, Telugu, and Roman. The texture based features comprise of a combination of Histograms of Oriented Gradients (HOG) and Moment invariants. The technique has been tested on 7000 handwritten text words in which each script contributes 1000 words. Based on the identification accuracies and statistical significance testing of seven well-known classifiers, Multi-Layer Perceptron (MLP) has been chosen as the final classifier which is then tested comprehensively using different folds and with different epoch sizes. The overall accuracy of the system is found to be 94.7% using 5-fold cross validation scheme, which is quite impressive considering the complexities and shape variations of the said scripts. This is an extended version of the paper described in (Singh et al., 2014).

引用

页码：74 / 94

页数：21

共 37 条

[1]

[Anonymous], 2008, P SPIE SIGNAL DATA P, DOI DOI 10.1016/S0165-1684(02)00499-1

[2]

Azar AT, 2015, STUD COMPUT INTELL, V575, pV

[3] Performance analysis of support vector machines classifiers in breast cancer mammography recognition [J].

Azar, Ahmad Taher ;

El-Said, Shaimaa Ahmed .

NEURAL COMPUTING & APPLICATIONS, 2014, 24 (05) :1163-1177

[4] Fast neural network learning algorithms for medical applications [J].

Azar, Ahmad Taher .

NEURAL COMPUTING & APPLICATIONS, 2013, 23 (3-4) :1019-1034

[5]

Chanda Sukalpa, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P926, DOI 10.1109/ICDAR.2009.239

[6]

Das M. S., 2012, Proceedings of the 2012 1st International Conference on Recent Advances in Information Technology (RAIT 2012), P487, DOI 10.1109/RAIT.2012.6194627

[7]

Das M.Swamy, 2011, INT J WISDOM BASED C, V1, P79

[8]

Demsar J, 2006, J MACH LEARN RES, V7, P1

[9]

Dhandra BV, 2006, 2006 1ST INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, P389

[10] Script identification in printed bilingual documents [J].

Dhanya, D ;

Ramakrishnan, AG ;

Pati, PB .

SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2002, 27 (1) :73-82

← 1 2 3 4 →