Improving Handwritten Arabic Text Recognition Using an Adaptive Data-Augmentation Algorithm

被引:3
|
作者
Eltay, Mohamed [1 ]
Zidouri, Abdelmalek [1 ]
Ahmad, Irfan [2 ]
Elarian, Yousef [3 ]
机构
[1] King Fahd Univ Petr & Minerals, Interdisciplinary Res Ctr Intelligent Secure Syst, Elect Engn Dept, Dhahran, Saudi Arabia
[2] King Fahd Univ Petr & Minerals, Interdisciplinary Res Ctr Intelligent Secure Syst, Informat & Comp Sci Dept, Dhahran, Saudi Arabia
[3] Cambrian Coll, Sudbury, ON, Canada
来源
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I | 2021年 / 12916卷
关键词
Handwriting recognition; Deep Learning Neural Network; Data augmentation; Recurrent Neural Network; Connectionist temporal classification;
D O I
10.1007/978-3-030-86198-8_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has increased the performance of classification and object detection, but it generally requires large amounts of labeled data for training. In this paper, we introduce a new data augmentation algorithm that promotes diversity between classes, representing the characters of the Arabic script, and can balance samples between different classes. This algorithm gives each word in the lexicon a weight. The weight of a word is based on the occurrence probabilities of the characters constituting the word. Minority classes are given higher weight as compared to the classes frequently occurring in the text. The data augmentation technique was evaluated on a handwritten word recognition task using the publicly available IFN/ENIT and AHDB datasets. We see significant improvement in results by employing our data augmentation technique, and we achieve state-of-the-art results on both datasets.
引用
收藏
页码:322 / 335
页数:14
相关论文
共 50 条
  • [1] Distilling GRU with Data Augmentation for Unconstrained Handwritten Text Recognition
    Liu, Manfei
    Xie, Zecheng
    Huang, YaoXiong
    Jin, Lianwen
    Zhou, Weiyin
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 56 - 61
  • [2] Data Augmentation Using Transformers and Similarity Measures for Improving Arabic Text Classification
    Refai, Dania
    Abu-Soud, Saleh
    Abdel-Rahman, Mohammad J.
    IEEE ACCESS, 2023, 11 : 132516 - 132531
  • [3] Offline Arabic Handwritten Text Recognition: A Survey
    Parvez, Mohammad Tanvir
    Mahmoud, Sabri A.
    ACM COMPUTING SURVEYS, 2013, 45 (02)
  • [4] CNN-based data augmentation for handwritten gurumukhi text recognition
    Sareen, Bhavna
    Ahuja, Rakesh
    Singh, Amitoj
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 71035 - 71053
  • [5] Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review
    de Sousa Neto A.F.
    Bezerra B.L.D.
    de Moura G.C.D.
    Toselli A.H.
    SN Computer Science, 5 (2)
  • [6] Using Data Augmentation for Improving Text Summarization
    Constantin, Daniel
    Mihaescu, Marian Cristian
    Heras, Stella
    Jordan, Jaume
    Palanca, Javier
    Julian, Vicente
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT II, 2025, 15347 : 132 - 144
  • [7] Data Augmentation for Recognition of Handwritten Words and Lines using a CNN-LSTM Network
    Wigington, Curtis
    Stewart, Seth
    Davis, Brian
    Barrett, Bill
    Price, Brian
    Cohen, Scott
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 639 - 645
  • [8] A Study of Data Augmentation for Handwritten Character Recognition Using Deep Learning
    Hayashi, Taihei
    Gyohten, Keiji
    Ohki, Hidehiro
    Takami, Toshiya
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 552 - 557
  • [9] High Performance Offline Handwritten Chinese Text Recognition with a New Data Preprocessing and Augmentation Pipeline
    Xie, Canyu
    Lai, Songxuan
    Liao, Qianying
    Jin, Lianwen
    DOCUMENT ANALYSIS SYSTEMS, 2020, 12116 : 45 - 59
  • [10] Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
    Cong-Thanh Do
    Imai, Shuhei
    Doddipatla, Rama
    Hain, Thomas
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 136 - 140