Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis

被引:28
作者
Aradhya, V. N. Manjunath [1 ]
Kumar, G. Hemantha [1 ]
Noushath, S. [1 ]
机构
[1] Univ Mysore, Dept Studies Comp Sci, Mysore 570006, Karnataka, India
关键词
document analysis; multi-lingual character recognition; South Indian languages; Fourier transform; principal component analysis (PCA);
D O I
10.1016/j.engappai.2007.05.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Character recognition lies at the core of the discipline of pattern recognition where the aim is to represent a sequence of characters taken from an alphabet [Kasturi, R., Gorman, L.O., Govindaraju, V., 2002. Document image analysis: a primer. Sadhana 27 (Part 1), 3-22]. Though many kinds of features have been developed and their test performances on standard database have been reported, there is still room to improve the recognition rate by developing improved features. In this paper, we present a multilingual character recognition system for printed South Indian scripts (Kannada, Telugu, Tamil and Malayalam) and English documents. South Indian languages are most popular languages in India and around the world. The proposed multilingual character recognition is based on Fourier transform and principal component analysis (PCA), which are two commonly used techniques of image processing and recognition. PCA and Fourier transforms are classical feature extraction and data representation techniques widely used in the area of pattern recognition and computer vision. Our experimental results show the good performance over the data sets considered. (c) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:658 / 668
页数:11
相关论文
共 22 条
  • [1] A font and size-independent OCR system for printed Kannada documents using support vector machines
    Ashwin, TV
    Sastry, PS
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2002, 27 (1): : 35 - 58
  • [2] Segmentation of touching and fused Devanagari characters
    Bansal, V
    Sinha, RMK
    [J]. PATTERN RECOGNITION, 2002, 35 (04) : 875 - 893
  • [3] OFF-LINE CURSIVE SCRIPT WORD RECOGNITION
    BOZINOVIC, RM
    SRIHARI, SN
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (01) : 68 - 83
  • [4] AN AUTONOMUS READING MACHINE
    CASEY, RG
    NAGY, G
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1968, C 17 (05) : 492 - +
  • [5] A complete printed Bangla OCR system
    Chaudhuri, BB
    Pal, U
    [J]. PATTERN RECOGNITION, 1998, 31 (05) : 531 - 549
  • [6] Chaudhuri BB, 1997, PROC INT CONF DOC, P1011, DOI 10.1109/ICDAR.1997.620662
  • [7] SWORDS: A statistical tool for analysing large DNA sequences
    Chaudhuri, P
    Das, S
    [J]. JOURNAL OF BIOSCIENCES, 2002, 27 (01) : 1 - 6
  • [8] CHARACTER-RECOGNITION - A REVIEW
    GOVINDAN, VK
    SHIVAPRASAD, AP
    [J]. PATTERN RECOGNITION, 1990, 23 (07) : 671 - 683
  • [9] Unconstrained handwritten character recognition based on fuzzy logic
    Hanmandlu, M
    Mohan, KRM
    Chakraborty, S
    Goyal, S
    Choudhury, DR
    [J]. PATTERN RECOGNITION, 2003, 36 (03) : 603 - 623
  • [10] JAWAHAR CV, 2003, P ICDAR 3 6 AUG, P656