Segmenting Characters from Malayalam Handwritten Documents

被引:0
作者
Hashrin, C. P. [1 ]
Jossy, Amal [1 ]
Sudhakaran, K. [1 ]
Thushara, A. [1 ]
John, Ansamma [1 ]
机构
[1] TKM Coll Engn, Dept Comp Sci & Engn, Kollam, Kerala, India
来源
PROCEEDINGS OF 2019 1ST INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION AND COMMUNICATION TECHNOLOGY (ICIICT 2019) | 2019年
关键词
OCR; segmentation; RECOGNITION;
D O I
10.1109/iciict1.2019.8741416
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Construction of an Optical Character Recognition (OCR) model for handwritten documents poses many challenges, the most prominent of them being dataset collection, character segmentation and classification. This paper focuses on the segmentation part, and presents a novel approach to segment individual characters from Malayalam handwritten documents. It is a three-stage approach where morphological operations, contour analysis, and bounding box detection are used to extract individual lines from the document, words from each line, and then characters from each word. An additional masking method is performed to tackle the overlapping of bounding boxes due to skewed lines and the presence of diacritics. The segmented characters can either be used to create datasets or fed to OCR models.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Novel Approach for Segmentation of Handwritten Touching Characters from Devanagari Words
    Doiphode, Akshata
    Ragha, Leena
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 621 - +
  • [32] An overview on handwritten documents word spotting
    Boualam, Manal
    Khaissidi, Ghizlane
    Mrabti, Mostafa
    Elfakir, Youssef
    2019 INTERNATIONAL CONFERENCE ON WIRELESS TECHNOLOGIES, EMBEDDED AND INTELLIGENT SYSTEMS (WITS), 2019,
  • [33] Text line extraction from multi-skewed handwritten documents
    Basu, S.
    Chaudhuri, C.
    Kundu, M.
    Nasipuri, M.
    Basu, D. K.
    PATTERN RECOGNITION, 2007, 40 (06) : 1825 - 1839
  • [34] From Contours to Characters Segmentation of Cursive Handwritten Words with Neural Assistance
    Kurniawan, Fajri
    Rehman, Amjad
    Mohamad, Dzulkifli
    ICICI-BME: 2009 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATION, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING, 2009, : 153 - 156
  • [35] Text Line Extraction from Multi-skewed Handwritten Documents
    Jiang Yong
    Chen Xiaojing
    PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 4, 2008, : 412 - +
  • [36] Segmenting Handwritten Math Symbols Using AdaBoost and Multi-Scale Shape Context Features
    Hu, Lei
    Zanibbi, Richard
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1180 - 1184
  • [37] Feature Extraction Using Geometrical Features for Malayalam Handwritten Character Recognition System
    Thushara, K.
    James, Ajay
    Saravanan, C.
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 477 - 482
  • [38] Recognizing arabic handwritten characters using deep learning and genetic algorithms
    Balaha, Hossam Magdy
    Ali, Hesham Arafat
    Youssef, Esraa Khaled
    Elsayed, Asmaa Elsayed
    Samak, Reem Adel
    Abdelhaleem, Mohammed Samy
    Tolba, Mohammed Mosa
    Shehata, Mahmoud Ragab
    Mahmoud, Mahmoud Refa'at
    Abdelhameed, Mariam Mahmoud
    Mohammed, Mostafa Mahmoud
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (21-23) : 32473 - 32509
  • [39] Efficient Analysis of Vertical Projection Histogram to Segment Arabic Handwritten Characters
    El Mamoun, Mamouni
    Mahmoud, Zennaki
    Kaddour, Sadouni
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (01): : 55 - 66
  • [40] Learning-based word spotting system for Arabic handwritten documents
    Khayyat, Muna
    Lam, Louisa
    Suen, Ching Y.
    PATTERN RECOGNITION, 2014, 47 (03) : 1021 - 1030