Statistical Machine Translation as a Language Model for Handwriting Recognition

被引:7
|
作者
Devlin, Jacob [1 ]
Kamali, Matin [1 ]
Subramanian, Krishna [1 ]
Prasad, Rohit [1 ]
Natarajan, Prem [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
来源
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012) | 2012年
关键词
D O I
10.1109/ICFHR.2012.273
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how "difficult" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.
引用
收藏
页码:291 / 296
页数:6
相关论文
共 50 条
  • [1] Bidirectional Language Model for Handwriting Recognition
    Frinken, Volkmar
    Fornes, Alicia
    Llados, Josep
    Ogier, Jean-Marc
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 611 - 619
  • [2] Statistical language models for on-line handwriting recognition
    Perraud, F
    Viard-Gaudin, C
    Morin, E
    Lallican, PM
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (08): : 1807 - 1814
  • [3] Making Language Model as Small as Possible in Statistical Machine Translation
    Liu, Yang
    Zhang, Jiajun
    Hao, Jie
    Zhang, Dakun
    MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 1 - 12
  • [4] Syntactic discriminative language model rerankers for statistical machine translation
    Carter, Simon
    Monz, Christof
    MACHINE TRANSLATION, 2011, 25 (04) : 317 - 339
  • [5] Language Model Supervision for Handwriting Recognition Model Adaptation
    Tensmeyer, Chris
    Wigington, Curtis
    Davis, Brian
    Stewart, Seth
    Martinez, Tony
    Barrett, William
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 133 - 138
  • [6] Statistical machine translation into a morphologically complex language
    Oflazer, Kemal
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 376 - 387
  • [7] Data Categorization and Model Weighting Approach for Language Model Adaptation in Statistical Machine Translation
    AbuHamad, Mohammed
    Mohd, Masnizah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 135 - 141
  • [8] Compact WFSA Based Language Model and Its Application in Statistical Machine Translation
    Fu, Xiaoyin
    Wei, Wei
    Lu, Shixiang
    Ke, Dengfeng
    Xu, Bo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, 2012, 333 : 154 - 163
  • [9] Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation
    Wang, Rui
    Zhao, Hai
    Lu, Bao-Liang
    Utiyama, Masao
    Sumita, Eiichiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) : 1209 - 1220
  • [10] A Cache Language Model for Whole Document Handwriting Recognition
    Frinken, Volkmar
    Karatzas, Dimosthenis
    Fischer, Andreas
    2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 166 - 170