Segmenting Characters from Malayalam Handwritten Documents

被引:0
|
作者
Hashrin, C. P. [1 ]
Jossy, Amal [1 ]
Sudhakaran, K. [1 ]
Thushara, A. [1 ]
John, Ansamma [1 ]
机构
[1] TKM Coll Engn, Dept Comp Sci & Engn, Kollam, Kerala, India
来源
PROCEEDINGS OF 2019 1ST INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION AND COMMUNICATION TECHNOLOGY (ICIICT 2019) | 2019年
关键词
OCR; segmentation; RECOGNITION;
D O I
10.1109/iciict1.2019.8741416
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Construction of an Optical Character Recognition (OCR) model for handwritten documents poses many challenges, the most prominent of them being dataset collection, character segmentation and classification. This paper focuses on the segmentation part, and presents a novel approach to segment individual characters from Malayalam handwritten documents. It is a three-stage approach where morphological operations, contour analysis, and bounding box detection are used to extract individual lines from the document, words from each line, and then characters from each word. An additional masking method is performed to tackle the overlapping of bounding boxes due to skewed lines and the presence of diacritics. The segmented characters can either be used to create datasets or fed to OCR models.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] A categorization system for handwritten documents
    Paquet, Thierry
    Heutte, Laurent
    Koch, Guillaume
    Chatelain, Clement
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2012, 15 (04) : 315 - 330
  • [22] A new method for segmenting unconstrained handwritten numeral string
    Zhao, B
    Su, H
    Xia, SW
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 524 - 527
  • [23] ONLINE RECOGNITION OF HANDWRITTEN ARABIC CHARACTERS
    ALEMAMI, S
    USHER, M
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (07) : 704 - 710
  • [24] Classification of Offline Gujarati Handwritten Characters
    Macwan, Swital J.
    Vyas, Archana N.
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1535 - 1541
  • [25] A benchmark image database of isolated Bangla handwritten compound characters
    Das, Nibaran
    Acharya, Kallol
    Sarkar, Ram
    Basu, Subhadip
    Kundu, Mahantapas
    Nasipuri, Mita
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2014, 17 (04) : 413 - 431
  • [26] Transcript mapping for handwritten English documents
    Jose, Damien
    Bharadwaj, Anurag
    Govindaraju, Venu
    DOCUMENT RECOGNITION AND RETRIEVAL XV, 2008, 6815
  • [27] Baseline Detection on Arabic Handwritten Documents
    Fawzi, Ahmed
    Pastor, Moises
    Martinez-Hinarejos, Carlos-D.
    PROCEEDINGS OF THE 2017 ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 17), 2017, : 193 - 196
  • [28] Automatic Transcription of Handwritten Medieval Documents
    Fischer, Andreas
    Wuethrich, Markus
    Liwicki, Marcus
    Frinken, Volkmar
    Bunke, Horst
    Viehhauser, Gabriel
    Stolz, Michael
    2009 15TH INTERNATIONAL CONFERENCE ON VIRTUAL SYSTEMS AND MULTIMEDIA PROCEEDINGS (VSMM 2009), 2009, : 137 - +
  • [29] On-line handwritten documents segmentation
    Blanchard, J
    Artières, T
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 148 - 153
  • [30] Multimodal Crowdsourcing for Transcribing Handwritten Documents
    Granell, Emilio
    Martinez-Hinarejos, Carlos-D.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (02) : 409 - 419