Statistical Machine Translation as a Language Model for Handwriting Recognition

被引:7
作者
Devlin, Jacob [1 ]
Kamali, Matin [1 ]
Subramanian, Krishna [1 ]
Prasad, Rohit [1 ]
Natarajan, Prem [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
来源
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012) | 2012年
关键词
D O I
10.1109/ICFHR.2012.273
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how "difficult" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.
引用
收藏
页码:291 / 296
页数:6
相关论文
共 50 条
  • [31] Statistical machine translation of subtitles for highly inflected language pair
    Maucec, Mirjam Sepesy
    Kacic, Zdravko
    Verdonik, Darinka
    PATTERN RECOGNITION LETTERS, 2014, 46 : 96 - 103
  • [32] MISTRAL: A Statistical Machine Translation Decoder for Speech Recognition Lattices
    Patry, Alexandre
    Langlais, Philippe
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1148 - 1153
  • [33] Applications of Statistical Machine Translation Approaches to Spoken Language Understanding
    Macherey, Klaus
    Bender, Oliver
    Ney, Hermann
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 803 - 818
  • [34] Machine Translation and Welsh: Analysing free Statistical Machine Translation for the professional translation of an under-researched language pair
    Screen, Ben
    JOURNAL OF SPECIALISED TRANSLATION, 2017, (28) : 317 - 344
  • [35] Towards incorporating language morphology into statistical machine translation systems
    Karageorgakis, P
    Potamianos, A
    Klasinas, I
    2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 80 - 85
  • [36] ON STATISTICAL MACHINE TRANSLATION METHOD FOR LEXICON REFINEMENT IN SPEECH RECOGNITION
    Xu, Haihua
    Xiao, Xiong
    Chng, Eng-Siong
    Li, Haizhou
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 25 - 29
  • [37] An Efficient Machine Translation Model for Dravidian Language
    Chandramma
    Pareek, Piyush Kumar
    Swathi, K.
    Shetteppanavar, Puneet
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 2101 - 2105
  • [38] On integrating a language model into neural machine translation
    Gulcehre, Caglar
    Firat, Orhan
    Xu, Kelvin
    Cho, Kyunghyun
    Bengio, Yoshua
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 137 - 148
  • [39] Online Handwriting-Based Gender Recognition Using Statistical and Machine Learning Approaches
    Shin, Jungpil
    Uchida, Yuta
    Maniruzzaman, Md.
    Hirooka, Koki
    Megumi, Akiko
    Yasumura, Akira
    IEEE ACCESS, 2024, 12 : 93791 - 93801
  • [40] Linguistic Resources for Handwriting Recognition and Translation Evaluation
    Song, Zhiyi
    Ismael, Safa
    Grimes, Steven
    Doermann, David
    Strassel, Stephanie
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3951 - 3955