Recognition of printed Urdu ligatures using convolutional neural networks

被引:7
作者
Uddin, Israr [1 ]
Javed, Nizwa [2 ]
Siddiqi, Imran [1 ]
Khalid, Shehzad [1 ]
Khurshid, Khurram [2 ]
机构
[1] Bahria Univ, Comp Sci Dept, Islamabad, Pakistan
[2] Inst Space Technol, iVis Lab, Elect Engn Dept, Islamabad, Pakistan
关键词
optical character recognition; cursive scripts; ligatures; convolutional neural networks;
D O I
10.1117/1.JEI.28.3.033004
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a holistic technique for recognition of text in cursive scripts using printed Urdu ligatures as a case study. Convolutional neural networks (CNNs) are trained on high-frequency ligature clusters for feature extraction and classification. A query ligature presented to the system is first divided into primary and secondary ligatures that are separately recognized and later associated in a postprocessing step to recognize the complete ligature. Experiments are carried out using transfer learning on pretrained networks as well as by training a network from scratch. The technique is evaluated on ligatures extracted from two standard databases of printed Urdu text, Urdu printed text image (UPTI) and Center of Language Engineering (CLE), as well as by combining the ligatures of the two datasets. The system realizes high recognition rates of 97.81% and 89.20% on the UPTI and CLE databases, respectively. (C) 2019 SPIE and IS&T
引用
收藏
页数:16
相关论文
共 49 条
[1]   Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory [J].
Ahmad, Ibrar ;
Wang, Xiaojie ;
Mao, Yuz Hao ;
Liu, Guang ;
Ahmad, Haseeb ;
Ullah, Rahat .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2018, 21 (01) :703-714
[2]  
Ahmad I, 2017, CHINA COMMUN, V14, P146, DOI 10.1109/CC.2017.7839765
[3]  
Ahmad Z, 2007, PROC WRLD ACAD SCI E, V26, P249
[4]   Adapting Tesseract for Complex Scripts: An Example for Urdu Nastalique [J].
Akram, Qurat ul Ain ;
Hussain, Sarmad ;
Niazi, Aneeta ;
Anjum, Umair ;
Irfan, Faheem .
2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, :191-195
[5]   QUWI: An Arabic and English Handwriting Dataset for Offline Writer Identification [J].
Al Maadeed, Somaya ;
Ayouby, Wael ;
Hassaine, Abdelaali ;
Aljaam, Jihad Mohamad .
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, :746-751
[6]  
[Anonymous], 2015, ICLR
[7]  
[Anonymous], 2006, CHARACTER RECOGNITIO
[8]  
[Anonymous], PROC CVPR IEEE
[9]  
[Anonymous], 2010, P INT C INF EM TECHN
[10]  
[Anonymous], 2012, PROCEEDING WORKSHOP