Offline Urdu Nastaleeq Optical Character Recognition Based on Stacked Denoising Autoencoder

被引:0
|
作者
Ahmad, Ibrar [1 ,2 ]
Wang, Xiaojie [1 ]
Li, Ruifan [1 ]
Rasheed, Shahid [3 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, CIST, 10 Xitucheng Rd, Beijing 100876, Peoples R China
[2] Univ Peshawar, Dept Comp Sci, Peshawar 25120, Pakistan
[3] PTCL, Islamabad 44000, Pakistan
基金
中国国家自然科学基金;
关键词
offline printed ligature recognition; urdu nastaleeq; denoising autoencoder; deep learning; classification;
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded (noise free) UPTI (Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.
引用
收藏
页码:146 / 157
页数:12
相关论文
共 50 条
  • [1] Offline Urdu Nastaleeq Optical Character Recognition Based on Stacked Denoising Autoencoder
    Ibrar Ahmad
    Xiaojie Wang
    Ruifan Li
    Shahid Rasheed
    中国通信, 2017, 14 (01) : 146 - 157
  • [2] Urdu Nastaleeq Optical Character Recognition
    Ahmad, Zaheer
    Orakzai, Jehanzeb Khan
    Shamsher, Inam
    Adnan, Awais
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 26, PARTS 1 AND 2, DECEMBER 2007, 2007, 26 : 249 - 252
  • [3] Offline Printed Urdu Nastaleeq Script Recognition with Bidirectional LSTM Networks
    Ul-Hasan, Adnan
    Bin Ahmed, Saad
    Rashid, Sheikh Faisal
    Shafait, Faisal
    Breuel, Thomas M.
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1061 - 1065
  • [4] The Application of Deep Convolutional Denoising Autoencoder for Optical Character Recognition Preprocessing
    Wiraatmaja, Christopher
    Gunadi, Kartika
    Sandjaja, Iwan Njoto
    2017 INTERNATIONAL CONFERENCE ON SOFT COMPUTING, INTELLIGENT SYSTEM AND INFORMATION TECHNOLOGY (ICSIIT), 2017, : 72 - 77
  • [5] Autoencoder Image Denoising to Increase Optical Character Recognition Performance in Text Conversion
    Alamsyah, Nur
    Fauzan, Mohamad Nurkamal
    Putrada, Aji Gautama
    Pane, Syafrial Fachri
    2022 INTERNATIONAL CONFERENCE ON ADVANCED CREATIVE NETWORKS AND INTELLIGENT SYSTEMS, ICACNIS, 2022, : 99 - 104
  • [6] Combining Offline and Online Preprocessing for Online Urdu Character Recognition
    Razzak, Muhammad Imran
    Hussain, Syed Afaq
    Sher, Muhammad
    Khan, Zeeshan Shafi
    IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 912 - +
  • [7] Knowledge-based Stacked Denoising Autoencoder
    Liu G.-L.
    Yu J.-B.
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (03): : 774 - 786
  • [8] Building Face Recognition System with Triplet-based Stacked Variational Denoising Autoencoder
    Le, Xuan Tuan
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 106 - 110
  • [9] Stacked Denoising Autoencoder for Feature Representation Learning in Pose-Based Action Recognition
    Budiman, Arif
    Fanany, Mohamad Ivan
    Basaruddin, Chan
    2014 IEEE 3RD GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE), 2014, : 684 - 688
  • [10] Offline Handwritten Malayalam character Recognition using stacked LSTM
    Jino, P. J.
    John, Jomy
    Balakrishnan, Kannan
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1587 - 1590