A language model using variable length tokens for open-vocabulary Hangul text recognition

被引:1
作者
Ryu, SH [1 ]
Kim, JH [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Div Comp Sci 373 1, Taejon 305701, South Korea
关键词
language model; character recognition; hangul recognition; open-vocabulary; word recognition;
D O I
10.1016/j.patcog.2003.12.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel language model for Hangul text recognition. Without relying on prior linguistic knowledge in training, the proposed model learns variable length Hangul character sequences, which comprise the elementary tokens of Korean language, and their probabilities from statistics of a raw text corpus. Experiments in handwritten Hangul recognition shows that the proposed language model is effective in postprocessing of recognition results. (C) 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:1549 / 1552
页数:4
相关论文
共 5 条
[1]   Inference of variable-length linguistic and acoustic units by multigrams [J].
Deligne, S ;
Bimbot, F .
SPEECH COMMUNICATION, 1997, 23 (03) :223-241
[2]  
Jurafsky D., 2000, Speech and Language Processing. An Introduction to Natural language Processing, Computational Linguistics
[3]   Hierarchical random graph representation of handwritten characters and its application to Hangul recognition [J].
Kim, HY ;
Kim, JH .
PATTERN RECOGNITION, 2001, 34 (02) :187-201
[4]   Korean large vocabulary continuous speech recognition with morpheme-based recognition units [J].
Kwon, OW ;
Park, J .
SPEECH COMMUNICATION, 2003, 39 (3-4) :287-300
[5]   Multi-level post-processing for Korean character recognition using morphological analysis and linguistic evaluation [J].
Lee, G ;
Lee, JH ;
Yoo, J .
PATTERN RECOGNITION, 1997, 30 (08) :1347-1360