Offline Urdu Nastaleeq Optical Character Recognition Based on Stacked Denoising Autoencoder

被引:0
|
作者
Ahmad, Ibrar [1 ,2 ]
Wang, Xiaojie [1 ]
Li, Ruifan [1 ]
Rasheed, Shahid [3 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, CIST, 10 Xitucheng Rd, Beijing 100876, Peoples R China
[2] Univ Peshawar, Dept Comp Sci, Peshawar 25120, Pakistan
[3] PTCL, Islamabad 44000, Pakistan
基金
中国国家自然科学基金;
关键词
offline printed ligature recognition; urdu nastaleeq; denoising autoencoder; deep learning; classification;
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded (noise free) UPTI (Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.
引用
收藏
页码:146 / 157
页数:12
相关论文
共 50 条
  • [21] Remote Sensing Image Classification Based on Stacked Denoising Autoencoder
    Liang, Peng
    Shi, Wenzhong
    Zhang, Xiaokang
    REMOTE SENSING, 2018, 10 (01):
  • [22] Electricity theft detection based on stacked sparse denoising autoencoder
    Huang, Yifan
    Xu, Qifeng
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2021, 125
  • [23] STACKED AUTOENCODER NETWORKS BASED SPEAKER RECOGNITION
    Zeng, Chun-Yan
    Ma, Chao-Feng
    Wang, Zhi-Feng
    Ye, Jia-Xiang
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2018, : 294 - 299
  • [24] Partial Discharge Patterns Recognition of GIS with Denoising-stacked Autoencoder Networks
    Zhao, Yiming
    Yan, Jing
    Wang, Yanxin
    Liu, Tingliang
    Jiang, Junjie
    2020 5TH ASIA CONFERENCE ON POWER AND ELECTRICAL ENGINEERING (ACPEE 2020), 2020, : 1815 - 1818
  • [25] Reverberant Speech Recognition Based on Denoising Autoencoder
    Ishii, Takaaki
    Komiyama, Hiroki
    Shinozaki, Takahiro
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3479 - 3483
  • [26] Classification of Alzheimer's Disease Based on Stacked Denoising Autoencoder
    Tong, Zheng-Lin
    Wang, Hai-Xing
    Yuan, Shao-Xun
    Sun, Xiao
    Xie, Jian-Ming
    2018 4TH ANNUAL INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC 2018), 2018, : 248 - 253
  • [27] A Stacked Denoising Autoencoder Based on Supervised Pre-training
    Wang, Xiumei
    Mu, Shaomin
    Shi, Aiju
    Lin, Zhongqi
    SMART INNOVATIONS IN COMMUNICATION AND COMPUTATIONAL SCIENCES, VOL 2, 2019, 670 : 139 - 146
  • [28] A Finite State Model for Urdu Nastalique Optical Character Recognition
    Sattar, Sohail Abdul
    Shams-ul Haque
    Pathan, Mahmood Khan
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (09): : 116 - 122
  • [29] The optical character recognition of Urdu-like cursive scripts
    Naz, Saeeda
    Hayat, Khizar
    Razzak, Muhammad Imran
    Anwar, Muhammad Waqas
    Madani, Sajjad A.
    Khan, Samee U.
    PATTERN RECOGNITION, 2014, 47 (03) : 1229 - 1248
  • [30] Optical Character Recognition System for Urdu Words in Nastaliq Font
    Shabbir, Safia
    Siddiqi, Imran
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (05) : 567 - 576