Multilingual Character Segmentation and Recognition Schemes for Indian Document Images

被引:45
作者
Sahare, Parul [1 ]
Dhok, Sanjay B. [1 ]
机构
[1] Visvesvaraya Natl Inst Technol, Dept Elect & Commun Engn, Ctr VLSI & Nanotechnol, Nagpur 440010, Maharashtra, India
关键词
Character recognition; character segmentation; document analysis; graph theory; multilingual Indian optical character recognition; OCR SYSTEM; SCRIPTS; TRANSFORM; ENGLISH;
D O I
10.1109/ACCESS.2018.2795104
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, robust algorithms for character segmentation and recognition are presented for multilingual Indian document images of Latin and Devanagari scripts. These documents generally suffer from their layout organizations, local skews, and low print quality and contain intermixed texts (machine-printed and handwritten). In the proposed character segmentation algorithm, primary segmentation paths are obtained using structural property of characters, whereas overlapped and joined characters are separated using graph distance theory. Finally, segmentation results are validated using highly accurate support vector machine classifier. For the proposed character recognition algorithm, three new geometrical shape-based features are computed. First and second features are formed with respect to the center pixel of character, whereas neighborhood information of text pixels is used for the calculation of third feature. For recognizing the input character, k-Nearest Neighbor classifier is used, as it has intrinsically zero training time. Comprehensive experiments are carried out on different databases containing printed as well as handwritten texts. Benchmarking results illustrate that proposed algorithms have better performances compared to other contemporary approaches, where highest segmentation and recognition rates of 98.86% and 99.84%, respectively, are obtained.
引用
收藏
页码:10603 / 10617
页数:15
相关论文
共 67 条
[51]   Pixel plot and trace based segmentation method for bilingual handwritten scripts using feedforward neural network [J].
Sharma, Manoj Kumar ;
Dhaka, Vijay Pal .
NEURAL COMPUTING & APPLICATIONS, 2016, 27 (07) :1817-1829
[52]   Segmentation of english Offline handwritten cursive scripts using a feedforward neural network [J].
Sharma, Manoj Kumar ;
Dhaka, Vijay Pal .
NEURAL COMPUTING & APPLICATIONS, 2016, 27 (05) :1369-1379
[53]   Stroke Detector and Structure Based Models for Character Recognition: A Comparative Study [J].
Shi, Cun-Zhao ;
Gao, Song ;
Liu, Meng-Tao ;
Qi, Cheng-Zuo ;
Wang, Chun-Heng ;
Xiao, Bai-Hua .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :4952-4964
[54]   Feature selection based classifier combination approach for handwritten Devanagari numeral recognition [J].
Singh, Pratibha ;
Verma, Ajay ;
Chaudhari, Narendra S. .
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2015, 40 (06) :1701-1714
[55]   Novel Geometrical Shape Feature Extraction Techniques for Multilingual Character Recognition [J].
Soora, Narasimha Reddy ;
Deshpande, Parag S. .
IETE TECHNICAL REVIEW, 2017, 34 (06) :612-621
[56]   Robust Feature Extraction Technique for License Plate Characters Recognition [J].
Soora, Narasimha Reddy ;
Deshpande, Parag S. .
IETE JOURNAL OF RESEARCH, 2015, 61 (01) :72-79
[57]   Recognition of handwritten characters using local gradient feature descriptors [J].
Surinta, Olarik ;
Karaaba, Mahir F. ;
Schomaker, Lambert R. B. ;
Wiering, Marco A. .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 45 :405-414
[58]   A two-stage character segmentation method for Chinese license plate [J].
Tian, Jiangmin ;
Wang, Ran ;
Wang, Guoyou ;
Liu, Jianguo ;
Xia, Yuanchun .
COMPUTERS & ELECTRICAL ENGINEERING, 2015, 46 :539-553
[59]   Multilingual scene character recognition with co-occurrence of histogram of oriented gradients [J].
Tian, Shangxuan ;
Bhattacharya, Ujjwal ;
Lu, Shijian ;
Su, Bolan ;
Wang, Qingqing ;
Wei, Xiaohua ;
Lu, Yue ;
Tan, Chew Lim .
PATTERN RECOGNITION, 2016, 51 :125-134
[60]   Handwriting segmentation of unconstrained Oriya text [J].
Tripathy, N. ;
Pal, U. .
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2006, 31 (6) :755-769