Discovering similar Chinese characters in online handwriting with deep convolutional neural networks

被引:0
作者
Shuye Zhang
Lianwen Jin
Liang Lin
机构
[1] South China University of Technology,School of Electronic and Information Engineering
[2] Sun Yat-Sen University,School of Data and Computer Science
来源
International Journal on Document Analysis and Recognition (IJDAR) | 2016年 / 19卷
关键词
Similar character; Confidence; Similarity measurement; Convolutional neural network; Similar character pairs/sets;
D O I
暂无
中图分类号
学科分类号
摘要
A primary reason for performance degradation in unconstrained online handwritten Chinese character recognition is the subtle differences between similar characters. Various methods have been proposed in previous works to address the problem of generating similar characters. These methods are basically comprised of two components—similar character discovery and cascaded classifiers. The goal of similar character discovery is to make similar character pairs/sets cover as many misclassified samples as possible. It is observed that the confidence of convolutional neural network (CNN) is output by an end-to-end manner and it can be understood as one type of probability metric. In this paper, we propose an algorithm by leveraging CNN confidence for discovering similar character pairs/sets. Specifically, a deep CNN is applied to output the top ranked candidates and the corresponding confidence scores, followed by an accumulating and averaging procedure. We experimentally found that the number of similar character pairs for each class is diverse and the confusion degree of similar character pairs is varied. To address these problems, we propose an entropy- based similarity measurement to rank these similar character pairs/sets and reject those with low similarity. The experimental results indicate that by using 30,000 similar character pairs, our method achieves the hit rates of 98.44 and 98.05 % on CASIA-OLHWDB1.0 and CASIA-OLHWDB1.0–1.2 datasets, respectively, which are significantly higher than corresponding results produced by MQDF-based method (95.42 and 94.49 %). Furthermore, recognition of ten randomly selected similar character subsets with a two-stage classification scheme results in a relative error reduction of 30.11 % comparing with traditional single stage scheme, showing the potential usage of the proposed method.
引用
收藏
页码:237 / 252
页数:15
相关论文
共 34 条
[1]  
Gao TF(2008)High accuracy handwritten Chinese character recognition using LDA-based compound distances Pattern Recognit. 41 3442-3451
[2]  
Liu CL(1999)Data clustering: a review ACM Comput. Surv. (CSUR) 31 264-323
[3]  
Jain AK(2011)SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation Int. J. Doc. Anal. Recognit. (IJDAR) 14 53-64
[4]  
Murty MN(1998)Gradient-based learning applied to document recognition Proc. IEEE 86 2278-2324
[5]  
Flynn PJ(2010)Recognition of handwritten Chinese characters by critical region analysis Pattern Recognit. 43 949-961
[6]  
Jin L(2005)Classifier combination based on confidence transformation Pattern Recognit. 38 11-28
[7]  
Gao Y(2013)Online and offline handwritten Chinese character recognition: benchmarking on new databases Pattern Recognit. 46 155-162
[8]  
Liu G(2014)Discrimination of similar characters using nonlinear normalization based on regional importance measure Int. J. Doc. Anal. Recognit. (IJDAR) 17 79-89
[9]  
Li Y(1997)A discrimination method of similar characters using compound Mahalanobis function Trans. IEICE Jpn. 80 2752-2760
[10]  
Ding K(2014)Learning confidence transformation for handwritten Chinese text recognition Int. J. Doc. Anal. Recognit. (IJDAR) 17 205-219