Open Vocabulary Arabic Handwriting Recognition Using Morphological Decomposition

被引:16
作者
Hamdani, Mahdi [1 ]
Mousa, Amr El-Desoky [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Aachen, Germany
来源
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR) | 2013年
关键词
D O I
10.1109/ICDAR.2013.63
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of Language Models (LMs) is a very important component in large and open vocabulary recognition systems. This paper presents an open-vocabulary approach for Arabic handwriting recognition. The proposed approach makes use of Arabic word decomposition based on morphological analysis. The vocabulary is a combination of words and subwords obtained by the decomposition process. Out Of Vocabulary (OOV) words can be recognized by combining different elements from the lexicon. The recognition system is based on Hidden Markov Models (HMMs) with position and context dependent character models. An n-gram LM trained on the decomposed text is used along with the HMMs during the search. The approach is evaluated using two Arabic handwriting datasets. The open vocabulary approach leads to a significant improvement in the system performance. Two different types experiments for two Arabic handwriting recognition tasks are conducted in this work. The proposed approach for open vocabulary allows to have an absolute improvement of up to 1% in the Word Error Rate (WER) for the constrained task and to keep the same performance of the baseline system for the unconstrained one.
引用
收藏
页码:280 / 284
页数:5
相关论文
共 12 条
  • [1] [Anonymous], 2012, Guide to OCR for Arabic Scripts
  • [2] An omnifont open-vocabulary OCR system for English and Arabic
    Bazzi, I
    Schwartz, R
    Makhoul, J
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1999, 21 (06) : 495 - 504
  • [3] Beulen K., 1997, P 5 EUROPEAN C SPEEC, P1179
  • [4] Handwritten address recognition with open vocabulary using character n-grams
    Brakensiek, A
    Rottland, J
    Rigoll, G
    [J]. EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 357 - 362
  • [5] Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition
    Doetsch, Patrick
    Hamdani, Mahdi
    Ney, Hermann
    Gimenez, Adria
    Andres-Ferrer, Jesus
    Juan, Alfons
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 3 - 7
  • [6] El-Desoky A., 2009, Proc. Interspeech, P2679
  • [7] Habash O. R. Nizar, 2009, P 2 INT C AR LANG RE
  • [8] Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition
    Kanoun, Slim
    Alimi, Adel M.
    Lecourtier, Yves
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (02): : 579 - 590
  • [9] Large vocabulary off-line handwriting recognition: A survey
    Koerich, AL
    Sabourin, R
    Suen, CY
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2003, 6 (02) : 97 - 121
  • [10] KHATT: Arabic Offline Handwritten Text Database
    Mahmoud, Sabri A.
    Ahmad, Irfan
    Alshayeb, Mohammad
    Al-Khatib, Wasfi G.
    Parvez, Mohammad Tanvir
    Fink, Gernot A.
    Maergner, Volker
    El Abed, Haikal
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 449 - 454