Text line segmentation and word recognition in a system for general writer independent handwriting recognition

被引:28
作者
Marti, UV [1 ]
Bunke, H [1 ]
机构
[1] Univ Bern, Inst Informat & Angew Math, CH-3012 Bern, Switzerland
来源
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS | 2001年
关键词
handwriting recognition; text line to word segmentation; word recognition; hidden Markov models;
D O I
10.1109/ICDAR.2001.953775
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a system for recognizing unconstrained English handwritten text based on a large vocabulary. We describe the three main components of the system, which are preprocessing, feature extraction and recognition. In the preprocessing phase the handwritten texts are first segmented into lines. Then each line of text is normalized with respect to of skein, slant, vertical position and width. After these steps, text lines are segmented into single words. For this purpose distances between connected components are measured. Using a threshold, the distances are divided into distances within a word and distances between different words. A line of text is segmented at positions it-here the distances are larger than the chosen threshold. Front each image representing a single word, a sequence of features is extracted. These features are input to a recognition procedure which is based on hidden Markov models. To investigate the stability of the segmentation algorithm the threshold that separates intra- and inter-word distances front each other is varied. If the threshold is small many errors are caused by over-segmentation, while for large thresholds under-segmentation errors occur. The best segmentation performance is 95.56% correctly segmented swords, tested on 541 text lines containing 3899 swords. Given a correct segmentation rate of 95.56%, a recognition rate of 73.45% on the word level is achieved.
引用
收藏
页码:159 / 163
页数:5
相关论文
共 16 条
[11]  
Marti U.-V., 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318), P705, DOI 10.1109/ICDAR.1999.791885
[12]  
Rabiner L., 1993, Fundamentals of Speech Recognition
[13]   EXTERNAL WORD SEGMENTATION OF OFF-LINE HANDWRITTEN TEXT LINES [J].
SENI, G ;
COHEN, E .
PATTERN RECOGNITION, 1994, 27 (01) :41-52
[14]   OFF-LINE CURSIVE WORD RECOGNITION [J].
SIMON, JC .
PROCEEDINGS OF THE IEEE, 1992, 80 (07) :1150-1161
[15]   COMPUTER RECOGNITION OF UNCONSTRAINED HANDWRITTEN NUMERALS [J].
SUEN, CY ;
NADAL, C ;
LEGAULT, R ;
MAI, TA ;
LAM, L .
PROCEEDINGS OF THE IEEE, 1992, 80 (07) :1162-1180
[16]  
Young S., 1999, HTK BOOK