Feature Extraction for Cursive Language Document Images Using Discrete Cosine Transform, Discrete Wavelet Transform and Gabor Filter

被引:1
作者
Siddiqui, Maria [1 ]
Siddiqi, Imran [2 ]
Khurshid, Khurram [1 ]
机构
[1] Inst Space Technol, Dept Elect Engn, Islamabad 44000, Pakistan
[2] Bahria Univ, Dept Comp Sci, Islamabad 44000, Pakistan
来源
PROCEEDINGS OF THE 2ND MEDITERRANEAN CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (MEDPRAI-2018) | 2018年
关键词
Feature selection; Feature extraction; Urdu Word Spotting; Discrete Cosine Transform (DCT); Discrete Wavelet Transform (DWT); Gabor Filter;
D O I
10.1145/3177148.3180099
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The efficiency of any machine learning and computer vision system depends largely on the robustness of feature extraction and selection process. In word spotting applications, many appropriate features have been proposed over the years in literature. Most of these features are extracted for Latin text but are used with Oriental script as well. Extracting features that are more specific to Oriental text is also being investigated and a lot of research is being focused on this aspect lately as well. Deep Learning has also been employed for this purpose. In this paper, we have tried investigate the performance of shape based features for Urdu script. Urdu and Arabic belong to the same family of script and both share similar set of alphabet. This means that features investigated on Urdu will give similar performance for Arabic as well as other Oriental scripts. For this paper, we have compiled results on approximately 21000 ligatures belonging to 200 unique classes taken from scanned pages of the popular Urdu series 'Zaawiyya'. This is Higher Education Commission granted project, due to this data set is provided by them. Proposed system gives encouraging results with precision of 88.5% and recall rate of 90.8%.
引用
收藏
页码:84 / 87
页数:4
相关论文
共 18 条
  • [1] Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach
    Abidi, Ali
    Siddiqi, Imran
    Khurshid, Khurram
    [J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1344 - 1348
  • [2] Offline handwritten Arabic cursive text recognition using Hidden Markov Models and re-ranking
    AlKhateeb, Jawad H.
    Ren, Jinchang
    Jiang, Jianmin
    Al-Muhtaseb, Husni
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (08) : 1081 - 1088
  • [3] Doulgeri Nikoleta, 2009, Proceedings of the SPIE - The International Society for Optical Engineering, V7247, DOI 10.1117/12.805602
  • [4] A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation
    Hussain, Raashid
    Raza, Ahsen
    Siddiqi, Imran
    Khurshid, Khurram
    Djeddi, Chawki
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2015, : 1 - 24
  • [5] Keyword based Information Retrieval System for Urdu Document Images
    Hussain, Raashid
    Khan, Haris Ahmad
    Siddiqi, Imran
    Khurshid, Khurram
    Masood, Asif
    [J]. 2015 11TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS (SITIS), 2015, : 27 - 33
  • [6] Imtiaz Hafiz, 2012, INT J SCI TECHNOLOGY, V1
  • [7] Word spotting in historical printed documents using shape and sequence comparisons
    Khurshid, Khurram
    Faure, Claudie
    Vincent, Nicole
    [J]. PATTERN RECOGNITION, 2012, 45 (07) : 2598 - 2609
  • [8] Khurshid K, 2008, PATTERN RECOGNITION IN INFORMATION SYSTEMS, PROCEEDINGS, P193
  • [9] A Detailed Review of Feature Extraction in Image Processing Systems
    Kumar, Gaurav
    Bhatia, Pradeep Kumar
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION TECHNOLOGIES (ACCT 2014), 2014, : 5 - +
  • [10] Farsi Word Spotting and Font Size Recognition
    Pourasad, Yaghoub
    Hassibi, Houshang
    Ghorbani, Azam
    [J]. FIRST WORLD CONFERENCE ON INNOVATION AND COMPUTER SCIENCES (INSODE 2011), 2012, 1 : 372 - 377